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(57) Abstract 

A free-lying cell classifier. An automated microscope 
system (511) comprising a computer (540) and high speed 
processing field of view processors (568) identifies free-lying 
cetls (80, 82). An image (II) of a biological specimen is 
obtained and the image (11) is segmented (10) to create a 
set of binary masks (15). The binary masks (15) are used 
by a feature calculator (12) to compute the features that 
characterize objects of interest (80. 82) including free-lying 
cells, artifacts and other biological objects. The objects (80, 
82) are classified to identify their type, their normality or 
abnormality or their identification as an artifact. The results 
are summarized and reported (18). A stain evaluation (20) of 
the slide is performed as well as a typicality evaluation (22). 
The robustness (24) of the measurement is also quantified as 
a classification confidence value (216). The free-lying cell 
evaluation is used by an automated cytology system (500) to 
classify a biological specimen slide. 



SEGMENT MAGE 

LOCATE THE 
POTENTIAL 
NUCLEI 



.A 



WAGE 



MASKS 



FEATURE CALCULATION 

COMPUTE THE FEATURES 
THAT CHARACTERIZE 
EACH OBJECT 



A. 



CLASSFY OBJECTS 

CLASSFY EACH OBJECT 
TO DENT FY ITS TYPE: 
NORMAL, A&JOFMAL, 
OR ARTFACT 



e 



STAN EVALUATION 

MEASURE ThC ST AM OF 
OBJECTS CLASSFED AS 
NTEfcyCOlATE CELLS 



20 



SL ATYPCALITY 

MEASURE THE CONFDENCE 
OF CLASSFCATDN FOR 
A SET OF ALARMS 



22 



■J 



ROBUSTNESS 



MEASURE THE 
SEGMENTATION AND 
CLASSFCATDN CONFDENCEl 



24 



SUMMARIZE RESULTS 

ACCUMULATE THE RESULTS 
OF THE SNCLE CELL 
ALGORfTHM AND RETURN 
THEM TO THE SYSTEM 



MSCELIANEOUS 
MEASUREMENTS 

MAKE OTHER VAROUS 
MEASUREMENTS ABOUT 
THE OBJECTS N THE FOV 



26 



BNSDOC1D: <WO 9609605A1 J_> 



Codes used to identify 
applications under the PCT. 



FOR THE PURPOSES OF INFORMATION ONLY 

S^pany to^PCT on ** *- *~ ° f ***** ***** 



AT 


Austria 


AU 


Auitr&lia 


BB 


Barbados 


BE 


Belgium 


BF 


Burkina Faio 


BG 


Bulgaria 


BJ 


Benin 


BR 


Brazil 


BY 


Be Una 


CA 


Canada 


CF 


Central African Republic 


CG 


Congo 


CH 


SwnxcrUnd 


a 


Cfce d'lvoiie 


CM 


Cameroon 


CN 


China 


cs 


Cxechoatovakia 


cx 


Cxech Republic 


DE 


Germany 


DK 


Denmark 


ES 


Spain 


n 


Finland 


FR 


France 


GA 


Gabon 



GB United Kingdom 

GE Georgia 

GN Guinea 

GR Greece 

HU Hungary 

IE Ireland 

rr n*iy 

jp Japan 

KE Kenya 

KG Kyrgyitin 

KP Democratic Peopk'* K*P ubUc 

of Korea 

ICR Republic of Korea 

KZ Kazakhstan 

U Liecmenitein 

LK Sri Lanka 

LU Luxembourg 

LV Latvia 

MC Monaco 

MD Republic of Moldova 

MG Madagaacar 

ML Malt 

MN Mongolia 



MR Mauritania 

MW Malawi 

NE Niger 

ML Netherlands 

NO Norway 

ffZ New Zealand 

PL Poland 

pT Portugal 

RO Romania 

RU RuasUn Federation 

SD Sudan 

SE Sweden 

St Slovenia 

SK Slovakia 

SN Senegal 

TD Chad 

TG Togo 

XJ Tajikistan 

XT Trinidad and Tobago 

\JA Ukraine 

US United Stale* of America 

UZ Uibtkiiun 

VN Viet Nam 



SNSDOCID: <WO 9609605A1_I_> 



WO 96/09605 



PCTAJS95/1I492 



APPARATUS FOR THE IDENTIFICATION OF FREE -LYING CELLS 

The invention relates to an automated cytology 
system and more particularly to an automated cytology 
that identifies and classifies free-lying cells and 
5 cells having isolated nuclei on a biological specimen 
slide . 

BACKGROUND OF THE INVENTION 

One goal of a Papanicolaou smear analysis system 
is to emulate the well established human review 

10 process which follows standards suggested by The 
Bethesda System. A trained cytologist views a slide 
at low magnification to identify areas of interest, 
then switches to higher magnification where it is 
possible to distinguish normal cells from potentially 

15 abnormal ones according to changes in their structure 
and context . In much the same way as a human reviews 
Papanicolaou smears, it would be desirable for an 
automated cytology analysis system to view slides at 
low magnification to detect possible areas of 

20 interest, and at high magnification to locate possible 
abnormal cells. As a cytologist compares size, shape, 
texture, context and density of cells against 
established criteria, so it would be desirable to 
analyze cells according to pattern recognition 

25 criteria established during a training period. 

SUMMARY OF THE INVENTION 
The invention identifies and classifies free- 
lying cells and cells having isolated nuclei on a 
biological specimen: single cells. Objects that 

30 appear as single cells bear the most significant 
diagnostic information in a pap smear. Objects that 
appear as single cells may be classified as being 
either normal cells, abnormal cells, or artifacts. 
The invention also provides a confidence level 

35 indicative of the likelihood that an object has been 
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, _ j "i accif ied The confidence 
mrrectlv identified and classitiea. 
evel Ls the rejection of slides having only a few 
confident abnormal ceUs. The st™ 
,c 0 i ; are also evaluated. Tne 

^aracteristics of the slide are 
region first acquires an ima 9 e o £ the ^biolo^a 

— at a P"-«^tenT^r: rcias^ftea. 
found in the image are identin 

uc?ed for subsequent suae 
This information is used 

classification. , ^ 

In one embodiment, the invention utilize, , . . set 
of statistical decision processes that identify 
of stacis Papanicolaou- stained 

potentially neoplastic cells P anc . orda nce 

cervical/vaginal smears. The decisions in accordance 
w 'h the invention as to whether an individual cell 
witn determine 
normal or potentially neoplastic are US6d 
if a slide is clearly normal or requires human review^ 
xf a slide inven tion uses nuclear and 

The apparatus of the invent ioi 

Tne a PP aS sification techniques to 

ZS-TESS - an, ceUs havi. 
, tloiatea nuclei. The apparatus of the -vention c n 
detect squamous intraepithelial les.on (SID or 

CanCe L C : d aI;ion to the detection and classification 
o£ single cells, the invention — the sp men 

measures of stem r intermediate squamous 

objects which are classified con£ iden=e 

cells. A1 so man, ^ • « ^ in 

with which object are classr information is 

,0 the single cell algor^m. ,11 potentially 

"Ix^rLu to ^ermine a final slide score, 
neoplastic cells c pr0 cessing: 
The invention performs three level f 

- n„„ feature extraction, and ofcuect 
image segmentation, teatuie 

35 classification. 



BNSDOCID: <WO 9609605A1_I_> 



T 

WO 96/09605 



PCT/US95/11492 



- 3 - 

Other objects, features and advantages of the 
present invention will become apparent to those 
skilled in the art through the description of the 
preferred embodiment, claims and drawings herein 
5 wherein like numerals refer to like elements. 

BRIEF DESCRIPTION OF THE DRAWINGS 
To illustrate this invention, a preferred 
embodiment will be described herein with reference to 
the accompanying drawings . 
10 Figures 1A, IB and 1C show the automated cytology 

screening apparatus of the invention. 

Figure 2 shows the method of the invention to 
arrive at a classification result from an image. 

Figure 3A shows the segmentation method of the 

15 invention. 

Figure 3B shows the contrast enhancement method 

of the invention. 

Figures 3C and 3D show a plot of pixels vs. 

brightness . 

20 Figure 3E shows the dark edge incorporated image 

method of the invention. 

Figure 3F shows the bright edge removal method of 
the invention. 

Figures 3G, 3H and 31 show refinement of an image 

25 by small hole removal. 

Figure 4A shows the feature extraction and object 
classification of the invention. 

Figure 4B shows an initial box filter. 

Figure 4C shows a stage 1 classifier. 
30 Figure 4D shows a stage 2 classifier. 

Figure 4E shows a stage 3 classifier. 

Figures 4F and 4G show an error graph. 

Figure 5 shows a stain histogram. 

Figure 6A shows robust and non-robust objects. 
35 Figure 6B shows a decision boundary. 
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Figure 6C shows a segmented object. 
Figure 7A shows a threshold graph. 
Figure 7B shows a binary decision tree. 
Figure 8 shows a stage 4 classifier. 
Figure 9 shows a ploidy classifier. 
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 
in a presently preferred embodiment of the 
invention, the system disclosed herein is used in a 
system for analyzing cervical pap smears, such as that 
shown and disclosed in U.S. Patent Application Serial 
No. 07/838,064, entitled "Method For Identifying 
Normal Biomedical Specimens", by Alan C. Nelson, et 
al filed February 18, 1992; U.S. Patent Application 
Serial No. 08/179,812 filed January 10, 1994 which is 
a continuation in part of U.S. Patent Application 
Serial No. 07/838,395, entitled "Method For 
Identifying Objects Using Data Processing Techniques", 
by S. James Lee, et al . . filed February 18, 1992; U.S. 
Patent Application Serial No. 07/838,070, now U.S. 
20 Pat. No. 5,315,700, entitled "Method And Apparatus For 
Rapidly Processing Data Sequences", by Richard S. 
Johnston, et al . , filed February 18, 1992; U.S. Patent 
Application Serial No. 07/838,065, filed 02/18/92, 
entitled "Method and Apparatus for Dynamic Correction 
of Microscopic Image Signals" by Jon w. Hayenga, et 
al • and U.S. Patent Application Serial No. 
08/302,355, filed September 7, 1994 entitled "Method 
and Apparatus for Rapid Capture of Focused Microscopic 
Im ages" to Hayenga. et al . , which is a continuation- 
in-part of Application Serial No. 07/838,063 filed on 
February 18, 1992 the disclosures of which are 
incorporated herein, in their entirety, by the 
foregoing references thereto. 

The present invention is also related to 
biological and cytological systems as described m the 
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following patent applications which are assigned to 
the same assignee as the present invention, filed on 
September 20, 1994 unless otherwise noted, and which 
are all hereby incorporated by reference including 
5 U.S. Patent Application Serial No. 08/309,118, to Kuan 
et al . entitled, "Field Prioritization Apparatus and 
Method," U.S. Patent Application Serial No. 

08/309,061, to Wilhelm et al . , entitled "Apparatus for 
Automated Identification of Cell Groupings on a 

10 Biological Specimen," U.S. Patent Application Serial 
No. 08/309,116 to Meyer et al . entitled "Apparatus for 
Automated Identification of Thick Cell Groupings on a 
Biological Specimen," U.S. Patent Application Serial 
No. 08/309,115 to Lee et al . entitled "Biological 

15 Analysis System Self Calibration Apparatus," U.S. 

Patent Application Serial No. 08/308,992, to Lee et 
al . entitled "Apparatus for Identification and 
Integration of Multiple Cell Patterns," U.S. Patent 
Application Serial No. 08/309,063 to Lee et al . 

20 entitled "A Method for Cytological System Dynamic 
Normalization," U.S. Patent Application Serial No. 
08/309,248 to Rosenlof et al . entitled "Method and 
Apparatus for Detecting a Microscope Slide. Coverslip, " 
U.S. Patent Application Serial No. 08/309,077 to 

25 Rosenlof et al . entitled "Apparatus for Detecting 
Bubbles in Coverslip Adhesive," U.S. Patent 
Application Serial No. 08/309,931, to Lee et al . 
entitled "Cytological Slide Scoring Apparatus," U.S. 
Patent Application Serial No. 08/309,148 to Lee et al . 

30 entitled "Method and Apparatus for Image Plane 
Modulation Pattern Recognition," U.S. Patent 

Application Serial No. 08/309,209 to Oh et al . 
entitled "A Method and Apparatus for Robust Biological 
Specimen Classification," U.S. Patent Application 

35 Serial No. 08/309,117, to Wilhelm et al . entitled 
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for Detection of Unsuitable 
"Method and Apparatus for 
p itiflM for Automated Cytology Scoring." 
Condon* f or ^ ^ ^ various processes 

d escr Ld herein .ay he implemented in software 
descr . nrt a diaital processor. The 

suitable for running on a digit * cen tral 
software may be embedded, for example, in the 

processor 540. show a 

of the invention for f«0- ^ ^ imaging 

The apparatus ^ of the . ,n ^ ^ ^ 

system 502, a moti pr0 cessing system 

processing system 536. cent P ^ 

54 °' ^^^1^-5...^ optics 
is comprised of an i n sensor 514 and 

510, a CCD camera 512, an 

an i„ age capture a»a focus jv* - »«■ timing 

capture and £o=us system 516 pr° ^ 
data to the CCD cameras S» ■ ^ 

provide i.a g es —^^Xi— sensor 
capture and focus system 516. An 
intensity is provided to the r.a g e capture a 
system 516 where an illunanatron sen or SI. 

* ♦->«. imaae from the optics 510. 
the sample of the image t further 

embodiment of the invention, the opt 
comprise an automated micros cope 511 The 
508 provides illumination of a slid . 

a nrt focus system 516 provides data to 
capture and focus sy image 
k « « B The VME bus distributes the data to 
bus 538. me vn - rnr pqsinq system 

^fi The image processmy j 
processing system 536. me a ^^ars B68 . The 
536 is comprised of f ield-of -view pr s 5 

images are sent along ^^^^^ 5,0 
capture and focus system 516 A c« ^ P ^ 
controls the operation of the proce ssor 
VME bus 538. in one embodiment the central p 
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562 comprises a MOTOROLA 68030 CPU. The motion 
controller 504 is comprised of a tray handler 518 , a 
microscope stage controller 520, a microscope tray 
controller 522, and a calibration slide 524. The 
5 motor drivers 526 position the slide under the optics. 
A bar code reader 528 reads a barcode located on the 
slide 524. A touch sensor 530 determines whether a 
slide is under the microscope objectives, and a door 
interlock 532 prevents operation in case the doors are 

10 open. Motion controller 534 controls the motor 
drivers 526 in response to the central processor 54 0. 
An Ethernet communication system 560 communicates to 
a workstation 542 to provide control of the system. 
A hard disk 544 is controlled by workstation 550. In 

15 one embodiment, workstation 550 may comprise a SUN 
SPARC CLASSIC (TM) workstation. A tape drive 546 is 
connected to the workstation 550 as well as a modem 
54 8, a monitor 552, a keyboard 554, and a mouse 
pointing device 556. A printer 558 is connected to 

20 the ethernet 560. 

During object identification and classification, 
the central computer 54 0, running a real time 
operating system, controls the microscope 511 and the 
processor to acquire and digitize images from the 

25 microscope 511. The flatness of the slide may be 
checked, for example, by contacting the four corners 
of the slide using a computer controlled touch sensor. 
The computer 540 also controls the microscope 511 
stage to position the specimen under the microscope 

30 objective, and from one to fifteen field of view (FOV) 
processors 568 which receive images under control of 
the computer 54 0. 

The computer system 54 0 accumulates results from 
the 4x process and performs bubble edge detection, 

35 which ensures that all areas inside bubbles are 



BNSDOCIO: <WO 9609605A1J_> 



WO 96/09605 



PCTAJS95/11'492 



8 - 



10 



15 



20 



25 



30 



excluded from processing by the invention. Imaging 
characteristics are degraded inside bubbles and tend 
to introduce false positive objects. Excluding these 
areas eliminates such false positives. 

The apparatus of the invention checks that cover 
slip edges are detected and that all areas outside of 
the area bounded by cover slip edges are excluded from 
image processing by the 20x process. Since the 
apparatus of the invention was not trained to 
recognize artifacts outside of the cover slipped area, 
excluding these areas eliminates possible false 

positive results. 

The computer system 54 0 accumulates slide level 
20x results for the slide scoring process. The 
computer system 54 0 performs image acquisition and 
ensures that 20x images passed to the apparatus of the 
inventions for processing conform to image quality and 
focus specifications. This ensures that no unexpected 
imaging characteristics occur. 

The invention performs three major steps, all of 
which are described in greater detail below: 
Step 1 - For each 20x FOV (20x objective 
magnification field of view), the 
algorithm segments potential cell 
nuclei and detects their cytoplasm 
boundaries. This step is called image 
segmentation. 

Step 2 - Next, the algorithm measures feature 
values - such as size, shape, density, 
and texture - for each potential cell 
nucleus detected during Step 1. This 
step is called feature extraction. 



Step 3 



The algorithm classifies each detected 
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object in an FOV using the extracted 
feature values obtained in Step 2. 
This step is called object 
classification. Classification rules 
5 are defined and derived during 

algorithm training. 

In addition to the object classification, other 
measures are made during the classification process 
which characterize the stain of the slide, and measure 
10 the confidence of classification. 

The single cell identification and classification 
system of the invention was trained from a cell 
library of training slides. 

The apparatus of the invention uses multiple 
15 layers of processing. As image data is processed by 
the apparatus of the invention, it passes through 
various stages, with each stage applying filters and 
classifiers which provide finer and finer 
discrimination. The result is that most of the 
20 clearly normal cells and artifacts are eliminated by 
the early stages of the classifier. The objects that 
are more difficult to classify are reserved for the 
later and more powerful stages of the classifier. 

During classifier development, the computer 
25 system 54 0 provides the invention with an image and 
allocates space for storing the features calculated on 
each object and the results of the apparatus of the 
invention. The apparatus of the invention identifies 
the potential nuclei in the image, computes features 
30 for each object, creates results, and stores the 
results in the appropriate location. 

During classifier development, the apparatus of 
the invention calculates and stores over 100 features 
associate with each object to be entered into the 
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object classifier training database. Additionally, 
the apparatus of the invention stores object truth 
information provided by expert cytologists for each 
object in the training database. Developers use 
5 statistical feature analysis methods to select 
features of utility for classifier design. Once 
classifiers have been designed and implemented, the 
apparatus of the invention calculates the selected 
features and uses them to generate classification 
10 results, confidence values, and stain measures. 

Refer now to Figure 2 which shows the item 
decomposition steps of the invention. In one 
embodiment of the invention, the computer system 540 
processes a 20x magnification field of view FOV^ 
Steps 10, 12, 14 and 18 are functions that apply to 
anobjects in the image. Steps 20, 22. 24 and 2 6 are 
performed only if certain conditions are met on 
example, stain evaluation 20 ta.es place only on 
objects that are classified as intermediate «1 

The first processing step is image segmentation 
10 that identifies objects of interest, or P<*«nti«l 
cell nuclei, and prepares a mas* 15 to identify the 
nucleus and cytoplasm boundaries of the objects. 

Features are then calculated 12 using the 
original image 11. and the mas, 15. The features are 
calculated in feature calculation step 12 for each 
object as identified by image segmentation 10 
Features are calculated only for objects that are at 
l...t ten pixels away from the edge of the image 11^ 
30 The feature values computed for objects that a 

closer to the edge of the image 11 are corrupted 
because some of the morphological features need more 
object area to be calculated accurately 

Based on the feature calculation step 12, each 
35 object is classified in classification step 14 as a 
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normal cell, an abnormal cell, or an artifact. At 
various stages throughout the classification process, 
several other measurements are made dependent on the 
classification results of the objects: 
5 o The stain evaluation step 20 measures stain 
related features on any object that has been 
identified as an intermediate cell, 
o An SIL atypicality process 22 measures the 
confidence of objects that were classified as 
10 potentially abnormal. 

o A robustness process 24 refers to the 
segmentation and classification. The robustness 
process 24 measures identified objects that are 
susceptible to poor classification results 
15 because they are poorly segmented or their 

feature values lie close to a decision boundary 
in a classifier, 
o A miscellaneous measurements process 26 includes 
histograms of confidences from the classifiers, 
20 histograms of the stain density of objects 

classified as abnormal, or proximity measurements 
of multiple abnormal objects in one image. 



The results of the above processes are summarized 
in step 18. The numbers of objects classified as 
25 normal, abnormal, or artifact at each classification 
stage are counted, and the results from each of the 
other measures are totaled. These results are 
returned to the system where they are added to the 
results of the other processed images. In total, 
these form the results of the entire slide. 

The 20x magnification images are obtained at 
Pixel size of 0.55 x 0.55 microns. The computer 540 
stores the address of the memory where the features 
computed for the objects in the FOV will be stored. 
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The computer also stores the address of the memory 
location where the results structure resides. This 
memory will be filled with the results of the 

invention. 

5 The computer system 540 outputs the following set 

of data for each field of view: 

SEGMENTATION FEATURES 

Four features are reported that characterize the 
segmentation of the image. 

10 SEGMENTED OBJECT COUNT 

The number of objects that were segmented m the 
FOV This number may be different from the 
number classified since objects that are too 
close to the edge of the FOV are not classified. 

15 OBJECT COUNTS OF INITIAL BOX FILTER 

The number of objects rejected by each of the 
five stages of the initial box filter. 

OBJECT COUNTS OF STAGEl CLASSIFIER 

The number of objects classified as normal, 
abnormal, or artifact by Stagel' s box classifier, 
and the number classified as normal, abnormal, or 
artifact at the end of the Stagel classifier. 
(Six numbers are recorded: three for the results 
of the Stagel box classifier, and three for the 
results of the Stagel classifier.) 

OBJECT COUNTS OF STAGE 2 CLASSIFIER 

The number of objects classified as normal, 
abnormal, or artifact by Stage2's box classifier, 
and the number classified as normal, abnormal, or 
artifact at the end of the Stage2 classifier. 
(Six numbers are recorded: three for the results 
of the Stage2 box classifier and three for the 
results of the Stage2 classifier.) 



30 



35 



OBJECT COUNTS OF STAGES CLASSIFIER 

Th e number of objects classified as normal, 
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abnormal, or artifact by Stage3's box classifier, 
and the number classified as normal, abnormal, or 
artifact at the end of the Stage3 classifier. 
(Six numbers are recorded: three for the results 
5 of the Stage3 box classifier and three for the 

results of the Stage3 classifier.) 

OBJECT COUNT OF STAGE4 CLASSIFIER 

The number of objects classified as abnormal by 
the Stage4 classifier. 

10 OBJECT COUNTS OF PLOIDY CLASSIFIER 

Two values are computed: the number of objects 
classified as abnormal by the first stage of the 
Ploidy classifier and the number of objects 
classified as highly abnormal by the second stage 
15 of the Ploidy classifier. 

OBJECT COUNTS OF STAGE 4 + PLOIDY CLASSIFIER 

Two values are computed: The number of objects 
classified as abnormal by the Stage4 classifier 
that were also classified as abnormal by the 
20 first stage of the Ploidy classifier, and the 

number of objects classified as abnormal by the 
Stage4 classifier that were also classified 
highly abnormal by the second stage of the Ploidy 
classifier. 

25 STAGE2/STAGE3/STAGE4 /PLOIDY ALARM CONFIDENCE HISTOGRAM 

Histograms for the alarm confidence of the 
Stage2, Stage3, Stage4 , and Ploidy alarms 
detected in an FOV. 

STAGE2/STAGE3 ALARM COUNT HISTOGRAM 

30 Two histograms for the alarm count histogram of 

the Stage2 and Stage3 alarms detected in an FOV. 

STAGE2/STAGE3 ALARM IOD HISTOGRAM 

Histograms for the Integrated Optical Density 
(IOD) of objects classified as abnormal by Stage2 
35 and Stage3 in an FOV. 
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INTERMEDIATE CELL IOD-SIZE SCATTER GRAMS 

Two IOD vs. size scattergrams of the normal 
intermediate cells detected in the FOV. 

INTERMEDIATE CELL STAIN FEATURES 

5 Six features are accumulated for each object 

classified as an intermediate cell. These 
features are all stain related and are used as 
reference values in the slide level 
classification algorithms. 

10 CONTEXTUAL STAGEl ALARM 

Number of Stagel alarms within a 200 pixel radius 
of a Stage2 alarm in the same FOV. 

CONTEXTUAL STAGE 2 ALARM 

Number of Stage2 alarms located within a 200 
pixel radius of a Stages alarm in the same FOV. 

ESTIMATED CELL COUNT 

An estimate of the number of squamous cells 
present in the image. 

ATYPICALITY INDEX 

An 8x8 array of confidences for all objects sent 
to the atypicality classifier. 

SEGMENTATION ROBUSTNESS AND CLASSIFICATION DECISIVENESS 

A set of confidence measures that an object was 
correctly segmented and classified. This 
information is available for Stage2 and Stages 
alarms . 

SINGLE CELL ADDON FEATURES 

A set of eight features for each object 
classified as a Stages alarm. This information 
will be used in conjunction with slide reference 
features to gauge the confidence of the Stages 
alarms . 

Prior to 20x magnification processing an FOV 
selection and integration process is performed at a 4x 
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magnification scan of the slide to determine the 
likelihood that each FOV contains abnormal cells. 
Next, the computer system 54 0 acquires the FOVs in 
descending order: from higher likelihood of abnormal 
5 cells to lower likelihood. 

Image segmentation 10 converts gray scale image 
data into a binary image of object masks. These masks 
represent a group of pixels associated with a 
potential cell nucleus. Using these masks, processing 

10 can be concentrated on regions of interest rather than 
on individual pixels, and the features that are 
computed characterize the potential nucleus. 

The image segmentation process 10 is based on 
mathematical morphology functions and label 

15 propagation operations. It takes advantage of the 
power of nonlinear processing techniques based on set 
theoretic concepts of shape and size, which are 
directly related to the criteria used by humans to 
classify cells. In addition, constraints that are 

20 application specific are incorporated into the 
segmentation processes of the invention; these include 
object shape, size, dark and bright object boundaries, 
background density, and nuclear/cytoplasmic 
relationships. The incorporation of application- 

25 specific constraints into the image segmentation 10 
process is a unique feature of the AutoPap® 3 00 
System's processing strategy. 

Refer now to Figure 3A which shows the image 
segmentation process 10 of the invention in more 

30 detail. The image segmentation process is described 
in a U.S. Patent application entitled "Method for 
Identifying Objects Using Data Processing Techniques" 
by Shih-Jong James Lee. For each image 29, the image 
segmentation process 10 creates a mask which uniquely 

35 identifies the size, shape and location of every 
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object in an FOV. There are three steps involved in 
image segmentation 10 after the 20x image data 29 
received in 20x imaging step 28: contrast enhancement 
30 image thresholding 32, and object refinement 34 

' During contrast enhancement 30 the apparatus of 
the invention first enhances, or normalizes, the 
contrast between potential objects of interest and 
their backgrounds: bright areas become brighter and 
dark areas become darker . This phase of processing 
creates an enhanced image 31. During image 

thresholding 32 a threshold test identifies objects of 
interest and creates a threshold image 33. The 
threshold image 33 is applied to the enhanced xmage 31 
to generate three binary mask images. These binary 
mask images are further refined and combined by an 
object refinement process 34 to identxfy the size 
shape, and location of objects. The contrast 
enhancement process 30 increases the contrast between 
pixels that represent the object of xnterest and 
pixels that represent the background. 

Refer now to Figure 3B which shows the contrast 
enhancement process 30 first normalizes the image 
background 36 by pixel averaging. The 
enhanced image 31 is derived from the **ff.r.nc. 
between the original image 29 and the 
background 40 computed in enhanced ob 3 ec xmag 

.reti 44 As part of the xmage contrast 
transformation step i*- **> r 

in each object in the field of 
enhancement process 30, eacn odj 

view undergoes a threshold test 38 usxng threshold 
30 data 42 to determine whether the brightness of the 

object lies within a predetermined range. 

contrast enhancement process stops at step 47. 

At this point, the apparatus of the invention 

begins to differentiate artifacts from cells so that 
35 artifacts are eliminated from further analysis. The 



20 
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apparatus of the invention provides a range! of 
predetermined values for several characteristics, 
including but not limited to brightness, size and 
shape of nucleus, cytoplasm and background, of the 
objects of interest. Objects whose characteristics do 
not lie within the range of these values are assumed 
to be artifacts and excluded from further 
classification . 

The brightness of an image is provided by 
histogram functions shown in Figures 3C and 3D 
respectively, which determines how many pixels within 
a gray scale FOV have a certain image intensity. 
Ideally, the histogram is a curve 48 having three 
peaks, as shown in the upper histogram in Figure 3C. 
15 The three peaks correspond to three brightness levels 
usually found in the images: the background, the 
cytoplasm, and the nuclei. If the number of pixels of 
each brightness level were plotted as a histogram, the 
largest, brightest peak would be the background since 
this usually makes up the largest portion of the image 
29. The medium brightness peak would correspond to 
the area of cytoplasm, and the darkest and shortest 
peak would correspond to the cell nuclei. 

This ideal representation rarely occurs since 
overlapped cells and cytoplasm tend to distort the 
results of the histogram as shown in the lower 
histogram 50 in Figure 3D. To reduce the impact of 
overlapping cells on brightness calculations, the 
apparatus of the invention applies morphological 
functions, such as repeated dilations and erosions, to 
remove overlapped objects from the image before the 
histogram is calculated. 

Referring again to Figure 3A, in addition to the 
contrast enhanced image 31, a threshold image 33 is 
35 generated by a morphological processing sequence. A 
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threshold test 32 is then performed on the enhanced 
image using the threshold image 33 to produce a binary 
image. The threshold test compares each pixel's value 
to the threshold image pixel value. The apparatus of 
the invention then identifies as an object pixel any 
pixel in the enhanced image that has an intensity 
greater than the corresponding pixel of the threshold 
value . 

The threshold image is combined with two 
predetermined offset values to generate three 
threshold images 135, 137 and 139. The first offset 
is subtracted from each gray scale pixel value of the 
original threshold image 33 to create a low threshold 
image. The second offset value is added to each gray 
scale pixel value of the threshold image to create a 
high threshold image. Each of these images - medium 
threshold, which is the original threshold image, low 
threshold, and high threshold - are separately 
combined with the enhanced image to provide three 
binary threshold images: a low threshold binary image 
35; a medium threshold binary image 37; and a high 
threshold binary image 39. 

Refer now to Figure 3E where the three binary 
threshold images are refined, beginning with the 
25 medium threshold binary image 37. The medium 
threshold binary image 37 is refined by eliminating 
holes and detecting the dark edges 52 of the objects 
of interest in the enhanced image. Dark edges 54 are 
linked using a small morphological closing and opening 
sequence to fill in holes. Dark edges are detected by 
determining where there is a variation in intensity 
between a pixel and its neighboring pixels. 
Thereafter, boundaries of an edge are detected 56 and 
identified as a true dark edge mask. The medium 
threshold binary image 3 7 is then combined in a set 
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union 58 with the edge boundary detected image 56 to 
create a dark edge incorporated image 74 . 

As illustrated in Figure 3F # bright edges 64 of 
the original image are then excluded from the medium 
5 threshold binary image 37. The bright edges of the 
enhanced image are detected in a manner similar to 
dark edge detection. The boundary of the dark edge 
incorporated image 74 is detected and combined with 
the bright edge enhanced image 64 in a set 

10 intersection operation 68. The results are subtracted 
70 from the dark edge incorporated image 74 to create 
a bright edge excluded image 72, The medium threshold 
binary image 3 7 is now represented by the bright edge 
excluded image 72. 

15 Refer to Figures 3G, 3H and 31 which show that 

Objects 80 from the bright edge excluded image 72 are 
completed by filling any holes 82 that remain. Holes 
82 can be filled without the side effect of connecting 
nearby objects. Small holes 82 are detected and then 

20 added to the original objects 80. To further refine 
the medium threshold binary image 37, the bright edge 
excluded image 72 is inverted (black becomes white and 
vice versa) . Objects that are larger than a 
predetermined size are identified and excluded from 

25 the image by a connected component analysis operation. 
The remaining image is then added to the original 
image, which provides the completed medium threshold 
binary mask that fills the holes 82. 

To complete the medium threshold binary image 37, 

3 0 connected objects that may not have been separated 
using the bright edge detection process of Figure 3F 
are separated. To do so, objects in the medium 
threshold binary mask 37 are eroded by a predetermined 
amount and then dilated by a second predetermined 

35 amount. The amount of erosion exceeds the amount of 



WO 96/09605 



PCT/US95/11492 



20 - 



dilation so that objects after dilation are smaller 
than before erosion. This separates connected 
objects . 

A morphological closing residue operation is 
5 applied to determine separation boundaries. A 
separation boundary is subtracted from the hole-filled 
image to create an overlap object separated binary 
image. To ensure that no objects have been lost in 
this process, the overlap object separated image is 
10 dilated to generate an object mask. Small objects not 
included in the object mask are combined in a set 
union with the object separation image to provide an 
object recovered image. 

Referring again to Figure 3A, in the last step. 
15 the high and low threshold binary images are combined 
with the object recovered image (the refined medium 
threshold binary image) to create final object masks 
41 43 and 45. All objects identified in the high 
threshold binary image 39 are added to the refined 
20 medium threshold binary image 37 using a set union 
operation. The resulting mask is eroded by a small 
amount and dilated by a large amount, so that all 
objects are connected to a single object. This mask 
is combined with the low threshold binary mask 35. 
25 Objects in the low threshold binary mask 35 that are 
not in close proximity to objects in the medium 
threshold binary mask 37 are added to the image. 
These objects are added to the refined medium 
threshold image 43 to create the finished mask.. A 
30 connected components labeling procedure removes small 
or oddly shaped objects and assigns a unique label to 
each remaining connected object. 

The segmented image 15 is used by the feature 
extraction process 12 to derive the features for each 
35 object. The features computed are characteristic 
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measures of the object such as size, shape, density, 
and texture. These measurements are input to the 
classifiers 14 and allow the apparatus of the 
invention to discriminate among normal cells, 
5 potentially abnormal cells, and artifacts. The 
features are defined below. 

The object classification process 14 consists of 
a series of classifiers that are grouped in stages. 
Each stage takes potentially abnormal objects from the 

10 previous stage and refines the classification result 
further using sets of new features to improve the 
accuracy of classification. At any stage, objects 
that are classified as normal or artifact are not 
classified further. 

15 Now refer to Figure 4A which shows the classifier 

process of the invention. Initial Box Filter 
classifiers 90 discards obvious artifacts. The data 
then proceeds through classification stagel, stage2, 
and stage3 , classifiers 92, 94, 96 and ends with the 

20 Stage4 and Ploidy classifiers 98, 100. 

The purpose of the Initial Box Filter classifier 
90 is to identify objects that are obviously not cell 
nuclei, using as few features as possible, features 
that preferably are not difficult to compute. Only 

25 the features required for classifications are computed 
at this point. This saves processing time over the 
whole slide. The initial box filter 90 comprises five 
separate classifiers designed to identify various 
types of artifacts. The classifiers operate in series 

3 0 as shown in Figure 4B 

As an object passes through the initial box 
filter, it is tested by each classifier shown in 
Figure 4B. If it is classified as an artifact, the 
object classification 14 is final and the object is 

35 not sent to the other classifiers. If it is not, the 
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object goes to the next classifier in the series. If 
an object is not classified as an artifact by any of 
5 classifiers 102, 104, 10*. 108 and 110, it will go 
to the Stagel classifier 92. 
5 Input to the initial box filter 90 comprises a 

set of feature measurements for each object segmented. 
The output comprises the following: 

The number of objects classified as artifact by 
each of the classifiers, which results in five 

10 numbers . 

The Stagel, Stage2, and Stage3 classification 
codes for each object classified as an artifact. 
An "active" flag that indicates whether the 
object has a final classification. If the object 
is classified as an artifact, it is not active 
anymore and will not be sent to other 
classifiers. 



• The initial box filter 90 uses 15 features, which 
are listed in the following table, for artifact 
20 rejection. Each classifier within the initial box 
filter 90 uses a 

features are grouped by their properties 



subset of these 15 features. The 



Feature type Feature name(e) 

Condensed Feature condensed_area_percent 
Context Texture Feature big_blur_ave 
Contrast Feature n c_contrast_orig 

Density Features ^izlAean.od.rS. 

integrated_density_orig 
nuc_bright_sm 

Nucleus/Cytoplasm Texture 

Contrast Feature nuc_eay C 
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Shape Features compactness 

density_l_2 
density_2_3 

Size Feature perimeter 

5 Texture Features sd_orig2 

nuc_blur_sd 
nuc_edge_9_mag 

The initial box filter is divided into five 
decision rules. Each decision is based on multiple 

10 features. If the feature value of the object is 
outside the range allowed by the decision rule, the 
object is classified as an artifact. The decision 
rule for each of the initial box filter classifiers is 
defined as follows: 

15 Boxl 102 

if ( 

perimeter >= 125 OR 
compactness >= 13 OR 
density_2_3 >= 7.5 OR 
20 density_l_2 10 

) 

then 

the object is an artifact. 

Box2 104 

25 else if ( 

mean_orig2 < 20 OR 
sd_orig2 < 5.3 OR 
sd_orig2 > 22.3 
) 

3 0 then 

the object is an artifact. 

Artifact Filter for Unfocused Objects and Polies#l 106 

else if ( 

nuc blur sd < 1.28 °R 
3 5 big"blur~ave < (-1.166 * nuc_blur_sd + 2.89 ) CR 

big~blur_ave < ( 4.58 * condensed__area_jpercent 
+ 0.8 ) OR 

compactness > (-0.136 * nuc_edge_9_mag + 18.05 ) 
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nuc„edge_5_5_sm > (-1.57 * compactness + 28.59 
) 

then . _ . 

5 the object is an artifact. 

Artifact Filter for Graphite#2 108 

else if / - 4 1 6 2 * 

nc_cont rast_or lg > I 4 . ± t> z 

normal ized_mean_od_r 3 + 615.96 ) 

10 then . _ 

the object is an artifact. 

Artifact Filter for Cytoplasm#3 110 

else if f A-i-io-i-x o * 

integrated_density_ong < ( 433933. z 

15 nuc_bright_sm - 335429.8 ) 

then , ^ 

the object is an artifact. 

6136 continue the classification process with the 
Stage 1 Box Filter. 



20 



25 



30 



35 



:s are 



Up to 40% of objects that are artifacts 
identified and eliminated from further processing 
during the initial box filter 90 processing. This 
step retains about 99% of cells, both normal and 
potentially abnormal, and passes them to Stagel 92 for 
further processing. 

Objects that are not classified as artifacts by 
the classifiers of the initial box filter 90 are 
passed to Stagel 92, which comprises of a box filter 
classifier and two binary decision tree classifiers as 
show in Figure 4C. The Stagel box filter 92 is used 
to discard objects that are obviously artifacts or 
normal cells, using new features which were not 
available to the initial box filter 90. The binary 
decision trees then attempt to identify the abnormal 
cells using a more complex decision process. 

The box filter 112 identifies normal cells and 
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artifacts: the classification of these objects is 
final. Objects not classified as normal or artifact 
are sent to Classifier#l 114 which classifies the 
object as either normal or abnormal. If an object is 
5 classified as abnormal, it is sent to Classifier#2 
116, where it is classified as either artifact or 
abnormal. Those objects classified as abnormal by 
Classifier#2 116 are sent to Stage2 92. Any objects 
classified as artifact by any of the classifiers in 
10 Stagel 92 are not sent to other classifiers. 

The input to Stagel 92 comprises of a set of 
feature measurements for each object not classified as 
an artifact by the box filters 90. The output 
comprises the following: 
15 o The numbers of objects classified as normal, 

abnormal, and artifact by the Stagel box 

classifier, 3 numbers, 
o The numbers of objects which were classified as 

normal, abnormal or artifact at the end of the 
20 Stagel classifier 92. 

o An "active" flag that indicates whether the 

object has a final classification. If the object 

has been classified as an artifact, it is not 

active anymore and is not sent to other 
25 , classifiers. 



The features that are used by each of the Stagel 
classifiers 92 are listed in the following tables. 
They are categorized by their properties. 



30 



Stagel Box Filter 112 
Feature type 



Feature name(s) 



Condensed Features 



condensed__count 

condensed_area_percent 

condensed_compactness 
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Context Density Feature 
Context Texture Features 

5 contrast Feature 
Density Feature 

Nucleus/Cytoplasm Relation 
Feature 

Shape Feature 
10 Texture Feature 

Stagel, Classifier**! 114 
Feature type 

Condensed Feature 
Context Texture Features 

15 



Contrast Feature 
20 Density Feature 

Nucleus/Cytoplasm Relation 
Features 



25 



Nucleus/Cytoplasm Texture 
Contrast Feature 

Shape Features 



mean_backgrouhd 

small_blur_ave 

big_blur_sd 
sm_blur_sd 

edge_contrast_orig 

integrated_density_od 



nc_score_r4 
compactness 
texture_correlation3 

Fe ature name(B) 

condensed_count 

big_blur_ave 

small_edge_9_9 

big_edge_5_mag 

big_edge_9_9 

sm_blur_sd 

edge_contrast_orig 
autothresh_enh 

tnod_N_C_ratio 
cell_nc_ratio 
nc_score_alt_r3 

nuc_edge_2_mag_big 

compactness2 

density_0_l 
inertia_2_ratio 
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Texture Features 



cooc_inertia_4_0 
sd_orig 

nonuni f orrn_run 
nuc_edge_2_mag 
nuc_blur_sk 
sd_enh2 

edge_density_r3 
cooc homo 1 0 



10 



15 



20 



25 



30 



Stagel, Classifier#2 116 
Feature type 



Context Density Feature 
Context Texture Features 

Contrast Feature 
Density Features 
integrated_density_orig2 

normalized_integrated_od 



Nucleus/Cytoplasm 
Relation Features 



Nucleus/Cytoplasm Texture 
Contrast Features 



Shape Feature 
Size Feature 
Texture Features 

below autothresh enh2 



Feature name(s) 



35 



big_bright 

big_edge_2_dir 
big_edge_9_9 

edge_contrast_orig 

mod_nuc_IOD_sm 

mod_nuc_OD_sm 

normalized mean od 



nc_score_r4 
cell_semi_isolated 
mod N C ratio 



nuc_e dge_9_mag__ sm , 
nu c_e d g e_ 9_9_b i g 

area_inner_edge 

perimeter 

edge_density_r3 
nuc_blur_ave 

cooc_energy__4_0 
cooc^ent ropy_l_l 3 5 
nuc_edge_2_dir 
cooc_cor r_l_9 0 
texture inertia3 
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The decision rules used in each classifier are 
defined as follows: 
Box Filter 112 

sm blur_ave <= 4.98465 AND 
edge_contrast_ong <= -42.ozj 

) 

then , . 

the object is normal 

else if i ^ - ^= -».s AND 



11 v c 
condensed_count <= 

compactness <= 10.6828 

sm blur_ave <= 3.0453 



) 

then 



integrated_density_od <= 19925 
condensed_area_j>ercent > 0.0884 

) 

the object is an artifact 



AND 
AND 

AND 



AND 
AND 



20 else if ( . - 

condensed count <= 3.5 
compactneis > 10.6828 
condensed_compactness <= 19.5789 

) 

25 then ^ object is an artifact 

else if ( j „ AND 

integrated density d<- 2231A m 

biQ blur sd <= 3.92-5-*^ 



sm_blur_sd <= 1.89516 
) 

then . , 

the object is normal 



35 inte gi ate 3 ;92333- AND 

big blur sd <= 3 g ?2333 

sm blur_sd > 1.89516 



else if ( , . ^_ ->-)374 AND 

integrated_density od <- 223/4 

sm_Diux_ 0 u n'ie755 AND 

nc_score_r4 <= J: J" 5 * 7534 AND 
texture_correlation3 <= o./^* 

40 mean_background > 226.66 

) 

then . , 

the object is normal 

else if ( , . .. _j „_ 22374 AND 

4 5 integrated_density_od <- 22J 
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big blur_sd <= 3.92333 AND 
sm_blur_sd > 1.89516 AND 
nc_score_r4 <= 0.36755 AND 
texture_correlation3 > 0.7534 
) 

the object is normal 



else if ( 

integrated_density_od <= 10957.5 AND 
10 big_blur_sd <= 3.92333 AND 

sm_blur_sd > 1.89516 AND 
nc_ score_r4 > 0.36755 

) 

then 

15 the object is normal 



else 



the object continues the classification process 
in Stagel, Classifierl. 



Stagel, Classifier#l 114 

This classifier is a binary decision tree that 
uses a linear feature combination at each node to 
separate normal cells from abnormal cells. The 
features described in the previous tables make up the 
linear combination. The features are sent to each 
25 node of the tree. The importance of each feature at 
each of the nodes may be different and was determined 
during the training process. 
Stagel, Classifier#2 116 

This classifier is a binary decision tree that 
uses a linear feature combination at each node to 
separate artifacts from abnormal cells. The features 
that make up the tree are listed in a previous table. 

A significant proportion of the objects 
classified as abnormal by Stagel 92 are normal cells 
35 and artifacts. Stage2 94 attempts to remove these, 
leaving a purer set of abnormal cells. .Stage2 94 
comprises a box filter 118, which discards objects 
that are obviously artifacts or normal cells, and two 
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binary decision trees shown in Figure 4D. 

The objects classified as abnormal by Stagel 92 
enter Stage2 94 . The box filter 118 identifies normal 
cells and artifacts; the classification of these 

S objects is final. Objects not classified as normal or 
artifact are sent to Classifier#l 120, which 
classifies the object as either normal or abnormal. 
If an object is classified as abnormal, it is sent to 
Classified 122, where it is classified as either 

10 artifact or abnormal. Those objects classified as 
abnormal by Classifier#2 122 are sent to Stage3 96. 
Any objects classified as normal or artifact by one of 
the classifiers in Stage2 94 are not sent to other 
classifiers . 

15 The input to Stage2 94 comprises of a set of 

feature measurements for each object classified as 
abnormal by Stagel. The output comprises the 
following: 

o The numbers of objects classified as normal, 
20 abnormal, and artifact by the box filter (3 

numbers) 

o The numbers of objects which were classified as 
normal, abnormal or artifact at the end of the 
Stage2 94 classifier. 
25 o An "active" flag, which indicates whether the 
object a final classification. (If it has been 
classified as artifact or normal it is not active 
anymore, and will not be sent to other 
classifiers . ) 

30 Featurea Required by the Stage2 94 ClassifierB 

The features that are used by each of the Stage2 
94 classifiers are listed in the following tables. 
They are categorized by feature properties. 
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Stage2 94 Box Filter 
Feature type 


Feature name(s) 




Condensed Features 


condensed_avg_area 


5 


Context Density Features 


mean_background 




Context Texture Features 


sm_blur — sd 

big_blur_ave 

sm_blur_ave 




Contrast Feature 


nc_contrast_orig 


10 


Density Features 


integrated_density_od 

normal i2ed"~integrated 
_od_r3 


15 


Shape Features 


compactness 
shape_score 




Texture Features 


nuc_ blur_sd 
texture — inert ia4 
texture range4 
edge_density_r3 




Staae2 94* Classifier 1 






Feature type 


Feature name(s) 


25 


Context Texture Features 


sm_blur_ave 

big edge 2 dir 

big~edge_5_mag 

big_blur_ave 

big_edge_9_9 

big_edge_3_3 




jjciiol ty rcctLui e 


m 1 n nd 




Shape Feature 


sbx (secondary box test) 


30 


Size Features 


area_inner_edge 
area 

nuclear_max 
perimeter2 


35 


Texture Features 


nuc_blur_ave 
nuc_blur_sk 


9609605 A 1 
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Stage2 94, Classifier 2 
Feature type 

Condensed Feature 
Context Density Features 

5 

Contrast Features 
Density Features 
10 Shape Features 

Texture Features 



15 



20 



25 



30 



35 



Fea ture nane(s) 

condensed_count 

mean_background 
mean_outer_od 

edge_contrast_orig 
nc_contrast_orig 

nuc_bright_big 
mod_nuc_OD_big 

compactness2 
density_0_l 

nuc_edge_9_mag 

nuc_blur_ave 

sd_orig2 

nuc_blur_sd 

nuc_edge_2_mag 



The Stage2 94 classifier comprises of a box 
filter and two binary decision trees as shown xn 
Figure 4D. The decision rules used in each classifier 
are defined as follows: 
Box Filter 118 

if ( condensed_avg area <- 9.4722 AND 
mean background > 235.182 



then 



) 

the object is normal 



else, if ( q ahoo AND 

condensed_avg_area > 9 * 47 ^ ^ 
condensed.compactness <o 30.8997^ AN 
nuc blur^sd <= 5.96505 AND 
meaH_background <= 233.4b 
compactness > 10.4627 
texture_inertia4 <= 0.3763 



AND 



AND 



) 



then 



the object is normal 
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else if ( 

integrated_density_od <= 3 0253 AND 
condensed_compactness <= 22.0611 AND 
sm_blur_sd <= 6.51617 AND 
5 shape_score <= 38.8071 AND 

texture_range4 <= 72.5 AND 
integrated_density_od > 15558.5 
) 

then 

10 the object is an artifact 

else if ( 

integrated_density_od <= 26781.5 AND 
edge_density_r3 <= 0.29495 AND 
mean_background > 233.526 
15 ) 
then 

the object is an artifact 

else if ( 

integrated_density_od2 23461 
20 normal ized_integrated_od_r3 <= 11176.7 AND 

big_blur_ave <= 5.0609 AND 
nc_contrast_orig > 37.1756 AND 
sm_blur_ave < = 3.0411 
) 

25 then 



AND 



else 



the object is normal 

continue the classification process with Stage2 
94, Classifier#l 120 

30 Stage2 Classifier#l 120 

This classifier is a binary decision tree that 
uses a linear feature combination at each node to 
separate normal cells from abnormal cells. The 
features used in the tree are listed in a previous 
35 table. 

Stage2 Classifier#2 122 

This classifier is a binary decision tree that 
uses a linear feature combination at each node to 
separate artifacts from abnormal cells. The features 
4 0 used in the tree are listed in a previous table. 

A portion of the objects classified as abnormal 
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eelle by the Stage* 94 « T"?,"^^ 

artifacts; therefore, the st.ge 3 96 classifier trres 
" Love those, leaving a purer set of abnormal 

C ens. x *- f -r:t s u r;r b r ^ 

= obviously artifacts or normal cells, ine 

fs followed by a binary decision tree shown in Figure 

<E ' The objects classified as abnormal by Stage* 94 
enter st,g«3 96. The box filter 
,0 cells and artifacts: the classif nation of these 
objects is final. Objects not classified as normal or 
artifact are sent to the classifier 128, whrch 
a lies the object as either normal/artifact or 
abnormal. If an object is classified as ™ ; < 
is sent to both stage4 98 and the Ploidy classifiers^ 
Objects classified as normal or artifact by one of 
Z classifiers in stage* 96 are not sent to other 

classifiers. feature 
input to stages 96 comprises of a set of feature 
me asurements for each object classified as abnormal by 
Staqe2 94. Outputs comprise the following: 
r 9 Th e nun*ers of objects classified as — 
abnormal, and artifact by the box fxlter, 

numbers. _ n 

. The number of objects classified as normal, 
abnormal or artifact at the end of the stage3 96 

classifier. 

„ An "active- flag that indicates whether the 
object has a final classification. If an object 
„as been classified as a normal or artrfact xt 
is not active anymore and will not be sent 

other classifiers. 



20 



25 
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The features that are 


used by each of the stage3 




96 classifiers are listed 


in the following tables. 




They are categorized by feature properties. 




Stage3 Box Filter 124 




5 


Feature type 


Feature name(s) 




Condensed Feature 


condensed_area_percent 




Context Density Features 


mean_background 
mean outer ou 




Context Distance Feature 


cy t opl a sm_max 


10 


Context Texture Features 


big_blur_sk 
big_blur_ave 
big edge_2__dir 
small_blur_sd 




Density Feature 


integrated_density_od 


15 


Nucleus/Cytoplasm 
Relation reacure 


r^ll semi isolated 




Shape Features 


shape_score 
density_0__l 


20 


Size Features 


perimeter 
area 




Texture Features 


nonunif orm__gray 
sd_enh 
nuc_blur_sd 
texture_range 


25 


Stage3 Classifier 128 






Feature type 


Feature name(s) 




Condensed Feature 


condensed_compactness 


30 


Context Density Features 


me an_ou t e r_od 
me an__ba ckground 
mean_outer_od_r3 




Context Texture Features 


big_blur_ave 
big__ edge_5_mag 
sm_edge_9_9 



] 
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Density Feature 
Shape Feature 
Texture Features 



min_od 
sbx 

nuc_edge_2_mag 
cooc_correlation_l_0 
cooc_inertia_2_0 
nonuniform_gray 



The stage3 96 classifier is composed of a box 
filter and a binary decision tree. The decision rules 
used in each classifier are as follows: 



10 



15 



Box Filter 124 



if ( 



then 



perimeter <= 54.5 
mean background <= 225.265 AND 
big_blur_sk > 1.33969 AND 
mean_background <= 214.015 

) 

the object is an artifact 



20 



25 



30 



35 



else lf ( AA *«-7 

nonuniform_gray <= 44.5557 
biq blur ave > 2.91694 AND 
area <= 333.5 AND 
sd enh > 11.7779 AND 
nuc_blur_sd > 3.53022 AND 
cytoplasm_max <= 11.5 
) 

the object is an artifact 



then 



else if ( _ Qc: „ 

nonuniform_gray <= 35.9b^ 
mean background <= 225.199 



AND 



AND 
AND 



AND 



then 



inteirated!density_od <= 31257.5 AND 
texture_range <= 76.5 
condensed_areaj?ercent <= 0.10055 

) 

the object is an artifact 
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else if ( 

nonunif orm_gray 44.4472 AND 
mean_background <= 226.63 AND 
integrated_density_od < = 32322.5 AND 
5 cell_semi_isolated > 0.5 

) 

then 

the object is an artifact 
else if ( 

10 nonunif orm_gr ay < = 44.4472 AND 

mean_background <= 226.63 AND 
integrated_density_od <= 32322.5 AND 
cell_semi_isolated <= 0.5 AND 
shape_score <= 69.4799 AND 

15 texture range > 75.5 

) 

then 

the object is an artifact 



if the object was just classified as an artifact; 
20 ( 

if 

big_edge_2_dir <= 0.3 891 

then 

the object is abnormal 

25 else if ( 

big_edge_2_dir <= 0.683815 AND 
cytoplasm_max <= 22.5 AND 
mean_background <= 223.051 AND 
sm_blur_sd <= 4.41098 AND 

30 mean_outer_od <= 38.6805 

) 

then 

the object is abnormal 
else if { 

35 big_edge_2_dir <= 0.683815 AND 

density_0_l > 27.5 
) 

then 

the object is abnormal 

40 

else if ( 

area > 337.5 AND 
mean_background > 223.66 
) 

4 5 then 

the object is abnormal 
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if the object was classified as abnormal 

thSn continue the classification process with the 
5 stage3 96 Classifier. 

Stage3 Classifier 128 

This classifier is a binary decision tree that 
uses a linear feature combination at each node to 
karate normal cells and artifacts from ab^ 
The features are listed in a prevxous table. 
10 cells. The featu stagel _ S tage3 is to separate 

The main purpose of Stagei y 
the populations of normal cells and artifacts from the 
T Ll cells To accomplish this, the decision 
abnormal cells classifiers were chosen to 

houndarxes 13 of populat ions as 
15 minimize misclassiticatioi 

shown for example, in Figure 4F. 

number of normal cells and artifacts on a 
are far greater than the number of 

abnormal cells, the pop classifier 

as abnormal by the end of the stage3 9 

.till contain some normal cells and artifacts 
still contai misc lassification 
For example: assume uuau 

25 ra te for norma! cells is 0.1%, and 10% for abnormal 
cells If a sUde contains 20 abnormal cells and 

30 relted al the end of the sta g e, classifier ma.es 
it difficult to recognize abnormal slides. 

The S taoe 4 , remaining 

decision making process to remo 
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normal/artifact objects from the abnormal population. 
Stage4 98 takes the population existing after stage3 
96 and identifies the clearly abnormal population with 
a minimum misclassif ication of the normal cells or 
5 artifacts. To do this, a -higher number of the 
abnormal cells are missed than was acceptable in the 
earlier stages, but the objects that are classified as 
abnormal do not have normal cells and artifacts mixed 
in. The decision boundary 138 drawn for the stage4 98 

10 classifier is shown in Figure 4G. 

Stage4 is made up of two classifiers. The first 
classifier was trained with data from stage3 96 
alarms. A linear combination of features was 
developed that best separated the normal/artifact and 

15 abnormal classes. A threshold was set as shown in 
Figure 4G that produced a class containing purely 
abnormal cells 130 and a class 134 containing a mix of 
abnormal, normal, and artifacts. 

The second classifier was trained using the data 

20 that was not classified as abnormal by the first 
classifier. A linear combination of features was 
developed that best separated the normal/artifact and 
abnormal classes. This second classifier is used to 
recover some of the abnormal cells lost by the first 

25 classifier. 

The input to stage4 98 comprises of a set of 
feature measurements for each object classified as 
abnormal by stage3 96. 

The output comprises of the classification result 

30 of any object classified as abnormal by stage4 98. 

The features that are used by each of the stage4 
98 classifiers are listed in the following table. 
There are two decision rules that make up the stage4 
98 classifier. Each uses a subset of the features 

35 listed. 
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Feature type 

Condensed Features 
Context Texture Features 

Density Features 



Nucleus /Cytoplasm Texture 
Contrast Features 



Texture Features 



Fea ture naine(s) 

condensed_compactness 

big_blur_ave 

nuc_blur_sd_sm 

big_edge_5_mag 

nuc bright_big 
normalized_integrated 

_od_r3 
normal ized_integrated_od 

nuc_edge_9_9_big 

nonun i f orm_gr ay 

texturejrange4 

below_autothresh_enh2 



15 



20 



25 



Decision Rules of stage4 98 

The classifier follows these steps: . 

!. Create the first linear combination of feature 

values. n , 

2 - It the v.lu. o£ the combination is « a threshold, 

the object is classified as abnormal, otherwise 

it is classified as normal. 
3. if the object was classified as normal, create 

the second linear combination. 
4 if the value of this second combination is 

grea ter than a threshold. the object « 

classified as abnormal, otherwise 

classified as normal. 



it 



is 
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15 



combinationl = nonunif orm_gray * 2 . 047321387e-02 

+ big_blur_ave * 6 . 059888005e-01 
+ nuc_edge_9_9_big * 
8 .407871425e-02+ big_edge_5_mag * 
-3 . 132035434e-01 + nuc_blur_sd_sm 
* 7.260803580e-01 

if combinationl * 3.06, the object is abnormal, 
if combinationl < 3.06, compute combination : 



combination2 = 



condensed_compactness * 
2 . 957029501e-03 + nonunif orm_gr ay 

* 7.682010997e-03 + 
be low_au t o thresh_enh2 * 
3 . 975555301e-01 + nuc_bright_big 

* - 9 . 175372124e-01 + 
normal ized_integrated_od_r3 * - 
4.740774966e-05 + 
normal ized_integrat ed_od * 
4.612372868e-05 + texture_range4 

* - 2.707793610e-03 



2 0 if combination >= -0.13 the object is abnormal. 

High grade SIL and cancer cells are frequently 
aneuploid, meaning that they contain multiple copies 
of sets of chromosomes. As a result, the nuclei of 
these abnormal cells stain very dark, and therefore, 

25 should be easy to recognize. The ploidy classifier 
100 uses this stain characteristic to identify 
aneuploid cells in the population of cells classified 
as abnormal by the stage3 96 classifier. The presence 
of these abnormal cells may contribute to the final 

30 decision as to whether the slide needs to be reviewed 

by a human or not . 

The ploidy classifier 100 is constructed along 
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15 



20 



25 



30 



the same lines as the stage4 98 classifier: it is 
trained on stage3 96 alarms. The difference is that 
this classifier is trained specifically to separate 
high grade SIL cells from all other cells; normal, 
other types of abnormals, or artifacts. 

The ploidy classifier 100 is made up of two 
simple classifiers. The first classifier was trained 
with data from stages 96 alarms. A linear combination 
of features was developed that best separated the 
normal/artifact and abnormal classes. A threshold was 
set that produced a class containing purely abnormal 
cells and a class containing a mix of abnormal, 
normal, and artifacts. 

The second classifier was trained using the data 
classified as abnormal by the first classifier. A 
second linear combination was created to separate 
aneuploid cells from other types of abnormal cells. 

The input to the ploidy classifier 100 comprises 
of a set of feature measurements for each object 
classified as abnormal by stage3 96. 

The output comprises of the classification 
results of any object classified as abnormal by either 
classifier in the ploidy classifier 100. 

The features used by each of the ploidy 
classifiers 100 are listed in the following table. 
There are two decision rules that make up the ploidy 
classifier 100. Each uses a subset of the features 
listed. 



Feature type 



Feature name (s) 



Context Texture Features 



Density Features 



35 



big_edge_5_mag 

big_edge_9_9 

big_blur_ave 

normalized_integrated_od 

nuc_bright_big 

max od 
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Density/Texture Features auto_mean_dif f _orig2 



Nucleus/Cytoplasm 
Relation Features 



Texture Features 



mod_N_C_ratio 
nc_score_r4 

nonun i f orm_gr ay 
texture_range4 
nuc blur sk 



10 



15 



20 



25 



Ploidy 100 Decision Rules 

The classifier follows these steps: 

1. Create a linear combination of feature values. 

2. If the value of the combination is >= a 
threshold, the object is classified as abnormal. 

3. If the object was classified as abnormal, create 
a second linear combination. 

4. If the value of this second combination is 
greater than a threshold, the object is 
classified as aneuploid, or highly abnormal. 



combination! = 



nonuniform_gray * 7 . 005183026e-03 
+ auto_mean_dif f_orig2 * 
1.776645705e-02 + mod_N_C_ratio * 
2 .493939400e-01 + nuc_bright_big 

* -9 . 405089021e-01 + 
normalized_integrated_od * 
2 . 770500259e-06 + big_blur_ave * 
1.802701652e-01 + big_edge_5_mag 

* -8 ,586113900e-02 + big_edge_9_9 

* -1. 906895824e-02 + nuc_blur_sk 

* -1 . 124482527e-01 + max__od * - 
1 ,787280198e-03; 

30 if combinationl a -0.090, the object is classified as 
abnormal . 



combination = big_blur_ave * 2 . 055980563e-01 + 

texture_range4 * -1 . 174426544e-02 
+ nc_score_r4 * 9 . 785660505e-01 ; 

35 if combination * 0.63, the object is classified as 




aneuploid . 
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The ploidy classifier 100 was trained on the same 
data set as the stage4 98 classifier: 861 normal cells 
or artifacts, and 1654 abnormal cells, composed of 725 
low grade SIL, and 929 high grade SIL. All objects 
were classified as abnormal by the stages 96 
classifier. 

The first classifier correctly identified 31.6% 
of the abnormal object, and mistakenly classified 9.4% 
of the normal cells and artifacts as abnormal. 

The second classifier was trained on all ob 3 ects 
which were classified as abnormal by the first 
classifier: 81 normal cells or artifacts, 124 low 
grade SIL cells, and 394 high grade SIL cells. The 
features were selected to discriminate between low 
grade and high grade cells, ignoring the normal cells 
and artifacts. The threshold was set using the low 
grade, high grade, normal cells and artifacts. It 
correctly classified 34.3% of the high grade SIL 
cells, and mistakenly classified 14.3% of the low 
grade, normal cells or artifacts as abnormal cells 
Or it classified 26.8% of the abnormal cells as high 
grade SIL, and 30.9% of the normal cells or artifacts 

as high grade SIL. 

The purpose of stain evaluation 20 is to evaluate 

25 the quality of stain for a slide and to aid in the 
classification of the slide. The stain evaluation 20 
for each FOV is accumulated during the 20x slide scan. 
This information is used at the end of the slide scan 
to do the following: 

3 0 Judge the quality of the stain. 

If the stain of a slide is too different from 
that of the slides the apparatus of the inventions 
were trained on, the performance of the classifier may 
be affected, causing objects to be misclassif led. 
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Aid in the classification of the elide. 

The stain features derived from the intermediate 
cells may be used to normalize other slide features, 
such as the density features measured on objects 
5 classified as abnormal. This will help verify whether 
the objects classified as abnormal are true abnormal 
cells or false alarms. 

Refer again to Figures 2 and 4A, the stain 
evaluation process 20 is composed of a classifier to 
10 identify intermediate cells and a set of stain-related 
features measured for those cells. Intermediate cells 
were chosen for use in the stain evaluation 20 because 
they have high prevalence in most slides, they are 
easily recognized by the segmentation process, and 
15 their stain quality is fairly even over a slide. 

The intermediate cell classifier is run early in 
the process of the invention, before the majority of 
the normal cells have been removed from consideration 
by the classifiers. For this reason, the classifier 
20 takes all of the cells classified as normal from the 
Stagel box classifier 112 and determines whether the 
cell is an intermediate cell or not. 

The intermediate cell classifier takes all 
objects identified as normal cells from the Stagel Box 
25 classifier 112 and determines which are well 
segmented, isolated intermediate cells. The 
intermediate cells will be used to measure the quality 
of staining on the slide, so the classifier to detect 
them must recognize intermediate cells regardless of 
30 their density. The intermediate cell classifier 
contains no density features, so it is stain 
insensitive . 

The features used by the intermediate cell 
classifier are listed in the following table. 



WO 96/09605 



PCTA3S95/11492 



46 



10 



15 



20 



30 



Feature type 



Fea ture name (a) 



Nucleus/Cytoplasm ratio 
Relation Features S!^lS"r4 

cell_semi_isolated 

Nuclear Texture Features SV^"-™ 
Context Texture Feature big_blur_ave 

Nuclear Size Feature area2 
Shape Features ^HnneTedge 

The intermediate cell classifier is composed of 
two classifiers. The first classifier is designed to 
find intermediate cells with a very low rate of 
^classification for other cell types. It is so 
stringent, it only classifies a tiny percentage of the 
intermediate cells on the slide as intermediate cells. 

To expand the set of cells on which to base the 
stain measurements, a second classifier was added that 
accepts more cells such that some small number of 
cells other than those of intermediate type may be 

included in the set. 

The following are the decision rules for the 

first and second classifiers: 

if 

25 ( mod_N_C_ratio s 0.073325 and 
nc_score_alt_r4 * 0.15115 and 
nuc_blur_ave > 4.6846 and 
big_blur_ave s 4.5655 and 
area2 > 96 . 5 and 
cell_semi_isolated > 0.5 and 

compactness s 10.2183 ) a , t 

the object is an intermediate cell according to the 

first classifier; 
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if 

{ mod_N_C_ratio s 0.073325 and 
nc_score_alt_r4 s 0.15115 and 
nuc_blur_ave > 4.6846 and 
big_blur_ ave s 4.5655 and 
area2 > 96.5 and 
cell_semi_isolated s 0.5 and 
area_inner_edge s 138.5 ) 

the object is an intermediate cell according to the 
second classifier. 



15 



The stain score generator 2 0 takes the objects 
identified as intermediate squamous cells by the 
intermediate Cell classifier, fills in histograms 
according to cell size and integrated optical density, 
and records other stain related features of each cell. 

The features used by the stain score generator 21 
are listed in the following table. 



20 



Feature type 



Nuclear Optical 
Density Features 



Feature name ( s) 



integrated_density_od 
mean od 



25 



Nuclear Size Feature 

Nucleus /Cytoplasm 
Relation Feature 



Nuclear Texture 
Features 



area 



nc_contrast_ong _ 
edge_contrast_orig 



sd_orig2 
nuc blur ave 



Cytoplasm Optical 
30 Density Features 



me an_ou t e r_od_r 3 



Now refer to Figure 5 which shows an example of 
a stain histogram 140. The stain histograms 140 are 
2-dimensional, with the x-axis representing the size 
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of the cell, and the Y-axis representing the 
integrated optical density of the cell. The IOD bins 
range from 0 (light) to 7 or 9 (dark). The stain 
histogram for the first classifier has 10 IOD bins 
while the second has only 8. The size bins range from 
0 (large) to 5 (small) . There are six stain bins 
containing the following size cells: 

Size Bin Size Range 

0 221 + 

! 191 - 220 

2 " 190 

3 131 - 160 

4 101 - 130 

5 0 - 100 

The bin ranges for the integrated optical 
densities of the cells from the first classifier are 
shown in the following table: 



Density Bin 


Density 


Range 


0 


4,000 - 


6, 000 


1 


6,001 - 


8,000 


2 


8,001 - 


10, 000 


3 


10,001 - 


- 12,000 


4 


12,001 - 


- 14,000 


5 


14,001, - 


- 16,000 


6 


16,001 


- 18,000 


7 


18,001 


- 20,000 


8 


20, 001 


- 22,000 


9 


22, 001+ 
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The bin ranges for the integrated optical 
densities of the cells from the second classifier 
are shown in the following table: 





Density Bin 


Density Range 


5 


0 


0 - 4,000 




1 


4 # 000 - 


8,000 




2 


8,001 - 


12,000 




3 


12,001 - 


- 16,000 




4 


16,001 - 


- 20,000 


10 


5 


20,001 - 


- 24,000 




6 


24,001 - 


- 28,000 




7 


28, 001+ 





Each object in the image identified as an 
intermediate cell is placed in the size/density 
15 histogram according to its area and integrated 
optical density. The first histogram includes 
objects classified as intermediate cells by the 
first classifier. The second histogram includes 
objects classified as intermediate cells by either 
20 the first or second classifier. 

The second part of the stain score generator 
accumulates several stain measurements for the 
objects classified as intermediate cells by either 
of the classifiers. The features are: 
25 mean_ od 

sd_orig2 

nc_contrast_orig 
mean_outer_od_r3 
nuc_blur_ave 
30 edge_contrast_orig 
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For each of these features, two values are 
returned to the computer system 540: 



(1) 



(2) 



The cumulative total of the feature values for 
all of the intermediate cells. This will be 
used to compute the mean feature value for all 
cells identified as intermediate cells over the 
whole slide. 

The cumulative total of the squared feature 
values for all of the intermediate cells. This 
will be used with the mean value to compute the 
standard deviation of the feature value for all 
cells identified as intermediate cells over the 
whole slide. 



S .d. = wr- («) 2 

where (u) 2 is the mean value of the feature value 
squared, and ( M 2 ) is the mean of the squared feature 
values . 

Now refer again to Figure 2, the SIL 
atypicality index 22 is composed of two measures: 
(1) an atypicality measure and (2) a probability 
density process (pdf) measure. The atypicality 
measure indicates the confidence that the object is 
truly abnormal. The pdf measure represents how 
similar this object is to others in the training 
data set. The combination of these two measures is- 
25 used to gauge the confidence that an object 
identified as abnormal by the Stage? 94 Box 
classifier is truly abnormal. The highest weight is 
given to detected abnormal objects with high 
atypicality and pdf measures, the lowest to those 
3 0 with low atypicality and pdf measures. 

As illustrated in Figure 4A, the atypicality 
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index 22 takes all objects left after the Stage2 94 
box filter and subjects them to a classifier. 

The following is a list of the features used by 
the atypicality index classifier 22: 
5 nonunif orm_gray 

nuc_edge_2_mag 
compactness2 
condensed^compactness 
texture_correlation3 
!0 nuc_bright_big 

mean_background 
inert ia_2_ratio 
nc_score_alt_r3 
edge_contrast_orig 
15 mod_N_C_ratio 

normal ized__mean_od_r 3 
norma 1 i zed_mean_od 
sd_ orig 
mod_nuc_OD 
20 sm_edge_9_9 

big_blur_ave 
big_edge_5_mag 
cooc_inertia_4_0 
min_od 

25 big_edge_9_9 

sm__blur_sd 
big_edge_2_dir 
sm_bright 
area_outer_edge 

3 0 area 

nuc_blur_ave 

nuc_blur_sd 

perimeter 

nuc blur sd sm 
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The following feature array is composed for the 



5 



10 



15 



25 



object to be classified 




Feature_Array [0] = 


nonunii orm__gray 


Feature_Array [1] = 


nuc_e dge_ 2_mag 


Feature_Array [2] « 


compactness z 


Feature_Array [3] = 


conciensea compac trie t>» 


Feature_Array [4] = 


texture correiatiouj 


Feature_Array [5] = 


nuc_bright_big 


Feature_Array [6] « 


mean background 


Feature_Array [7] = 


inertia__2_ratio 


Feature_Array [8] « 


nc_score_alt_r3 


Feature_Array [9] = 


edge__contrast_orig 


Feature_Array [10] 


= mod_N_C__ratio 


Feature_Array [11] 


= normal ized_mean_oa__r 3 


Feature_Array [12] 


= normal ized_jnean_od 


Feature_Array [13] 


= sd_orig 


Feature_Array [14] 


= mod nuc_uu 


Feature_Array [15] 


= sm_edge — 9_9 


Feature_Array [16] 


s= big — blur_ave 


Feature_Array [17] 


= big edge 5 mag 


Feature_Array [18] 


= cooc inertia ^ u 


Feature_Array [19] 


= min od 


Feature_Array [20] 


= big_edge_9_9 


Feature_Array [21] 


= sm_blur_sd 


Feature_Array [22] 


= big_edge_2_dir 


Feature_Array [23] 


- sm_bright 


Feature_Array [24] 


= area_outer_edge 


Feature_Array [25] 


= cc.area 


Feature_Array [26] 


= nuc_blur_ave 


Feature_Array [27] 


= nuc_blur_sd 


Feature_Array [28] 


» perimeter 


Feature__Array [2 9] 


= nuc_blur__sd_sm 



The original feature array is used to derive a 
feature vector with 14 elements. Each element 
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corresponds to an eigenvector of a linear 
transformation as determined by discriminant 
analysis on the training data set. 

The new feature vector is passed to two 
5 classifiers which compute an atypicality index 23 
and a pdf index 25. The atypicality index 23 
indicates the confidence that the object is truly 
abnormal. The pdf index 25 represents how similar 
this object is to others in the training data set. 
10 Once the two classification results have been 

calculated, they are used to increment a 2- 
dimensional array for the two measures. The results 
returned by each of the classifiers is an integer 
number between 1 and 8, with 1 being low confidence 
15 and 8 high confidence. The array contains the 

atypicality index on the vertical axis, and the pdf 
index on the horizontal axis. 

One indication of a classifier's quality is its 
ability to provide the same classification for an 
20 object in spite of small changes in the appearance 

or feature measurements of the object. For example, 
if the object was re-segmented, and the segmentation 
mask changed so that feature values computed using 
the segmentation mask changed slightly, the 
25 classification should not change dramatically. 
An investigation into the sources of 
classification non- repeatability was a part of the 
development of the invention. As a result, it was 
concluded that there are two major causes of non- 
30 repeatable classification comprising object and 
presentation effects and decision boundary 
effects. As the object presentation changes, the 
segmentation changes, affecting all of the feature 
measurements, and therefore, the classification. 
35 Segmentation robustness indicates the 
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variability of the segmentation mask created for an 
object for each of multiple images of the same 
object. An object with robust segmentation is one 
where the segmentation mask correctly matches the 
5 nucleus and does not vary from image to image in the 
case where multiple images are made of the same 
object . 

The decision boundary effects refer to objects 
that have feature values close to the decision 
10 boundaries of the classifier, so small changes in 
these features are more likely to cause changes in 
the classification result. 

Classification decisiveness refers to the 
variability in the classification result of an 
15 object as a result of it's feature values in 
relation to the decision boundaries of the 
classifier. 

The classification decisiveness measure will be 
high if the object's features are far from the 
20 decision boundary, meaning that the classification 

result will be repeatable even if the feature values 
change by small amounts. Two classifiers were 
created to rank the classification robustness of an 
object. One measures the classification robustness 
25 as affected by the segmentation robustness. The 
other measures the classification robustness as 
affected by the classification decisiveness. 

The segmentation robustness classifier 24 ranks 
how prone the object is to variable segmentation and 
30 the classification decisiveness classifier 26 ranks 
the objects in terms of its proximity to a decision 
boundary in feature space. 

Figure 6A illustrates the effect of object 
presentation on segmentation. The AutoPap® 300 
35 System uses a strobe to illuminate the FOV. As a 



BNSOOCIO: <WO 9609605A1_I_> 



WO 96/09605 



PCTYUS95/11492 



result, slight variations in image brightness occur 
as subsequent images are captured. Objects that 
have a very high contrast between the nucleus and 
cytoplasm, such as the robust object 142 shown in 
5 Figure 6A, tend to segment the same even when the 
image brightness varies. Such objects are 
considered to have robust segmentation. 

Objects that have low contrast, such as the 
first two non-robust objects 144 and 146, are more 

10 likely to segment differently when the image 

brightness varies; these objects are considered to 
have non-robust segmentation. Another cause of non- 
robust segmentation is the close proximity of two 
objects as is shown in the last non-robust object 

15 148. The segmentation tends to be non-robust 
because the segmentation process may group the 
objects . 

Robust segmentation and classification accuracy 
have a direct relationship. Objects with robust 

20 segmentation are more likely to have an accurate 

segmentation mask, and therefore, the classification 
will be more accurate. Objects with non-robust 
segmentation are more likely to have inaccurate 
segmentation masks, and therefore, the 

25 classification of the object is unreliable. The 

segmentation robustness measure is used to identify 
the objects with possibly unreliable classification 
results . 

Figure 6B illustrates the decision boundary 
30 effect. For objects 154 with features in proximity 
to decision boundaries 150, a small amount of 
variation in feature values could push objects to 
the other side of the decision boundary, and the 
classification result would change. As a result, 
35 these objects tend to have non-robust classification 
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results. On the other hand, objects 152 with 
features that are far away from the decision 
boundary 150 are not affected by small changes in 
feature values and are considered to have more 
5 robust classification results. 

The segmentation robustness measure is a 
classifier that ranks how prone an object is to 
variable segmentation. This section provides an 
example of variable segmentation and describes the 
10 segmentation robustness measure. 
Variable Segmentation Example: 

The invention image segmentation 10 has 11 

steps : 

1. Pre-processing 

15 2. Histogram statistics 

3. Background normalization 

4. Enhanced image generation 

5. Thresholding image generation 

6. Apply thresholding 

7. Dark edge incorporation 
B. Bright edge exclusion 
9. Fill holes 

10 Object separation and recovery 

High threshold inclusion and low value 



20 7 



11 



25 pick up 

The areas of the segmentation that are most 

sensitive to small changes in brightness or contrast 

are steps 7. 8, and 9. Figure 6C illustrates the 
are step . ^.^ . n ^ cases 

operation of these cme t> 
30 can cause the segmentation to be non- robust^ Line 
(a , shows the object 170 to be segmented, which 
comprises of two objects close together^ 
shows the correct segmentation of the object 172, 
174, 176, and 178 through the dark edge 
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incorporation, bright edge exclusion, and fill holes 
steps of the segmentation process respectively. 
Line (C) illustrates a different segmentation 
scenario for the same object 182, 184, 186 and 188 
5 that would result in an incorrect segmentation of 
the object. 

The dark edge incorporation step (7) attempts 
to enclose the region covered by the nuclear 
boundary. The bright edge exclusion step (8) 

10 attempts to separate nuclear objects and over- 
segmented artifacts, and the fill hole step (9) 
completes the object mask. This process is 
illustrated correctly in line (B) of Figure 6C. If 
there is a gap in the dark edge boundary, as 

15 illustrated in line (C) , the resulting object mask 
188 is so different that the object will not be 
considered as a nucleus. If the object is low 
contrast or the image brightness changes, the 
segmentation may shift from the example on line (B) 

20 to that on line (C) . 

The input to the segmentation robustness 
measure comprises of a set of feature measurements 
for each object classified as abnormal by the second 
decision tree classifier of Stage2 94. 

25 The output comprises of a number between 0.0 

and 1.0 that indicates the segmentation robustness. 
Higher values correspond to objects with more robust 
segmentation . 

The features were analyzed to determine those 

30 most effective in discriminating between objects 

with robust and non-robust segmentation. There were 
only 800 unique objects in the training set. To 
prevent overtraining the classifier, the number of 
features that could be used to build a classifier 

35 was limited. The features chosen are listed in the 
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following table: 



Feature type 



Feature name(s) 



Context Distance Feature 

Context Texture Features 

Nuclear Density Feature 
Nuclear Texture Features 



min_distance 

context_3a 

context_lb 

sm_bright 
sm_edge_9_9 

mean_od 

hole_percent 



This classifier is a binary decision tree that 
uses a linear feature combination at each node to 
separate objects with robust segmentation from those 
with non-robust segmentation. The features 
described in the following list make up the linear 

combination: 

Feature_Array [0] = mean_od 

Feature_Array[l] = sm_bright 

Feature_Array [2] = sm_edge_9_9 

Feature_Array[3] = context_3a 

Feature_Arrayt4] = holejpercent 

Feature_Array[5] = context_lb 

Feature_Array[6] = min_distance 

The features that are sent to each node of the 
tree are identical, but the importance of each 
feature at each of the nodes may be different; the 
importance of each feature was determined during the 

training process. 

The tree that specifies the decision path is 
called the Segmentation Robustness Measure 
Classifier It defines the importance of. each 
feature at each node and the output classification 
at each terminal node. 
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The classification result is a number between 
0.0 and 1.0 indicating a general confidence in the 
robustness, where 1.0 corresponds to high 
confidence . 

5 The classifier was trained using 2373 objects 

made up of multiple images of approximately 800 
unique objects where 1344 objects were robust and 
1029 were non-robust. 

The performance of the classifier is shown in 
10 the following table: 

Robust Non- R obust 

Robust 



Non -Robust 



1128 


216 


336 


693 



The vertical axis represents the true robustness of 
15 the object, and the horizontal axis represents the 
classification result. For example, the top row of 
the table shows the following: 

o 1128 objects with robust segmentation were 
classified correctly as robust. 
20 o 216 objects with robust segmentation were 
classified incorrectly as non-robust. 

The classifier correctly identified 77% of the 
objects as either having robust or non-robust 
segmentation . 

25 The confidence measure is derived from the 

classification results of the decision tree. 
Therefore, using the confidence measures should 
provide approximately the same classification 
performance as shown in the preceding table. 

30 The classification decisiveness measure 

indicates how close the value of the linear 
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combination of features for an object is to the 
decision boundary of the classifier. The 
decisiveness measure is calculated from the binary 
decision trees used in the final classifiers of 
Stage2 94 and stage3 96 by adding information to the 
tree to make it a probabilistic tree. 

The probabilistic tree assigns probabilities to 
the left and right classes at each decision node of 
the binary decision tree based on the proximity of 
the feature linear combination value to the decision 
boundary. When the linear combination value is 
close to the decision boundary, both left and right 
classes will be assigned a similar low decisiveness 
value. When the linear combination value is away 
from the decision boundary, the side of the tree 
corresponding to the classification decision will 
have high decisiveness value. The combined 
probabilities from all the decision nodes are used 
to predict the repeatability of classification for 

20 the object. 

A probabilistic Fisher's decision tree (PFDT) 

is the same as a binary decision tree, with the 
addition of a probability distribution in each non- 
terminal node. An object classified by a binary 
25 decision tree would follow only one path from the 

root node to a terminal node. The object classxfxed 
by the PFDT will have a classification result based 
on the single path, but the probability of the 
object ending in each terminal node of the tree xs 
30 also computed, and the decisiveness is based on 
those probabilities. 

Figures 7A and 7B show how the decisiveness 
me asure is computed. The object is classified by 
the regular binary decision trees used in Stage2 94 
35 and stages 96. The trees have been modified as 
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follows. At each decision node, a probability is 
computed based on the distance between the object 
and the decision boundary. 

At the first decision node, these probabilities 
5 are shown as p 2 and 1 - p a . The feature values of 
the objects which would be entering the 
classification node are assumed to have a normal 
distribution 190. This normal distribution is 
centered over the feature value 194, and the value 

10 of Pj is the area of the normal distribution to the 
left of the threshold 192. If the features were 
close to the decision boundary, the values of p 2 and 
l-p 2 indicated by area 196 would be approximately 
equal. As the feature combination value drifts to 

15 the left of the decision boundary, the value of p 2 
increases. Similar probability values are computed 
for each decision node of the classification tree as 
shown in Figure 7B. The probability associated with 
each classification path, the path from the root 

20 node to the terminal node where the classification 
result is assigned, is the product of the 
probabilities at each branch of the tree. The 
probabilities associated with each terminal node is 
shown in Figure 7B. For example, the probability of 

25 the object being classified classl in the left most 
branch is PjP*. The probability that the object 
belongs to one class is the sum of the probabilities 
computed for each terminal node of that class. The 
decisiveness measure is the difference between the 

30 probability that the object belongs to classl and 
the probability that it belongs to class2. 
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P M = ftft + ( 1 ' ftX 1 ' ft) 

= ftP - ft) + I 1 ' ft)ft 
Decisiveness = lp cto ; ~ ^1 

The invention computes two classification 
decisiveness measures. The first is for objects 
classified by the second decision tree classifier of 
Stage2 94. The second is for objects classified by 
5 the decision tree classifier of stage3 96. The 

classification decisiveness measure is derived as 
the object is being classified. The output 
comprises the following: 

o The classification decisiveness measure for the 
10 object at Stage2 94 and stage3 96 if the object 

progressed to the stage3 96 classifier. The 
decisive measures range from 0.0 to 1.0. 
o The product of the classification confidence 

and the classification decisiveness measure for 
!5 the object at Stage2 94 and stage3 96. 

The features used for the classification 
decisiveness measure are the same as those used for 
the second decision tree of Stage2 94 and decision 
tree of stage3 96 because the classification 
20 decisiveness measure is produced by the decision 
trees . 

The decision rules for the classification 
decisiveness measure are the same as those used for 
the second decision tree of Stage2 94 and decision 
25 tree of stage3 96 because the classification 

decisiveness measure is produced by the decision 
trees . 

Refer again to Figure 2, miscellaneous 
measurements process 26 describes features which are 
computed during classification stages of the 



30 
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invention. They are described here because they can 
be grouped together and more easily explained than 
they would be in the individual classification stage 
descriptions. The following features are described 
5 in this part of the disclosure: 

Stage2 Confidence Histogram 

Stage3 Confidence Histogram 

Stage4 Confidence Histogram 

Ploidy Confidence Histogram 
10 Stage2 94 IOD histogram 

Stage3 IOD histogram 

Contextual Stagel Alarms 

Contextual Stage2 94 Alarms 

Addon Feature Information 
15 Estimated Cell Count 

Confidence Histograms 

When objects on a slide are classified as 
alarms, knowing with what confidence the 
classifications occurred may help to determine 
20 whether the slide really is abnormal or not. 

Therefore, the following alarm confidence histograms 
are computed: 

o Stage2 94 
o Stage3 96 
25 o Stage4 98 

Stage2 94 

The classifier for Stage2 94, classifier 2 is a 
binary decision tree. The measure of confidence for 
each terminal node is the purity of the class at 
30 that node based on the training data used to 

construct the tree. For example, if a terminal node 
was determined to have 100 abnormal objects and 50 
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normal objects, any object ending in that terminal 
node would be classified as an abnormal object, and 
the confidence would be (100 + 1) / (150 + 2 ) or 
0.664. 

The 10 bin histogram for Stage2 94 confidences 
is filled according to the following confidence 
ranges . 
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Confidence Bin 


Confidence 


Range 


0 


0. 


000 - 


0 . 


490 


1 


0 . 


500 - 


0 . 


690 


2 


0. 


700 - 


■ 0. 


790 


3 


0. 


800 - 


■ 0. 


849 


4 


0. 


850 ■ 


- 0. 


874 


5 


0. 


875 - 


- 0. 


899 


6 


0. 


900 • 


- 0. 


924 


7 


0. 


925 


- 0. 


949 


8 


0 


950 


- 0, 


974 


9 


0 


. 975 


- 1 


000 



Stage3 

The confidence of the stage3 96 classifier is 
determined in the same manner as the Stage2 94 
classifier. The confidence histogram bin ranges are 
also the same as for the Stage2 94 classifier. 
Stage4 

Figure. 8 illustrates how the confidence is 
computed for the stage4 98 classifier. The 
classification process is described in the object, 
classification 14 Stage4 98 section. If the object 
is classified as abnormal at steps 204/203 by the 
first classifier that uses the feature combination 1 
step 202, the probability is computed in step 210 as 
25 described below. The object will not go to the 

second classifier, so the probability for the second 
classifier is set to 1.0 in step 212, and the final 
confidence is computed in step 216 as the product of 
the first and second probabilities. If the object 
was classified as normal at step 204 and step 201 by 
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the first classifier, the probability is computed, 
and the object goes to the second classifier that 
uses the feature combination 2 step 206. If the 
object is classified as abnormal by the second 
classifier at step 208 and step 205, the probability 
is computed in step 214 for that classifier, and the 
final confidence is computed as the product of the 
first and second probabilities in step 216. If the 
object is classified as normal by the second 
classifier, no confidence is reported for the 
object. 

To determine the confidence of the 
classification results in stage4 98, the mean and 
standard deviations of the linear combinations of 
the normal/artifact and abnormal populations were 
calculated from the training data. These 
calculations were done for the feature combination 1 
step 202 and feature combination 2 step 206. The 
results are shown in the following table: 





Feature 
Combination 1 


Feature 
Combination 2 


Normal/ Artifact mean 


2.55 


- 0.258 


Normal/ Artifact sd 


0.348 


0.084 


Abnormal mean 


2.80 


4.201 


Abnormal sd 


0.403 


0.095 



Using the means and standard deviations 
calculated, the normal and abnormal likelihoods are 
computed for feature combination 1: 
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normaljikelihood = " norm_popmeanf 

Norm _j)opjsd 



abnormaljikelihood = M"'-™ 1 ™ " abnorm jopjneanf 

abnorm _pop_sd 

Compute the likelihood ratio as: 

likelihood_ratio = 

norm_pop_ — (txyfQ. 5 (abnorm Jikelihood - norm likelihood)]) 
abnorm _popjsd ~~ 

Normalize the ratio: 

probl - tik e tih°°d- rar i° 



1 + likelihood ratio 



If the object is classified as normal by the 
first classifier and as abnormal by the second 
5 ^ classifier, compute the normalized likelihood ratio 
as described previously using the means and standard 
deviations from the second feature combination. 
This value will be prob2 . The confidence value of 
an object classified as abnormal by the stage4 98 

10 classifier is the product of probl and prob2, and 
should range from 0.0 to 1.0 in value. The 
confidence value is recorded in a histogram. 

The confidence histogram has 12 bins. Bin[0] 
and Bin [11] are reserved for special cases. If the 

15 values computed for combination 1 or combination 2 
fall near the boundaries of the values existing in 
the training set, then a confident classification 
decision cannot be made about, the object. If the 
feature combination value of the object is at the 

20 high end of the boundary, increment bin [11] by 1 . If 
the feature combination value is at the low end, 
increment bin[0] by 1. The decision rules for these 
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cases are stated as follows: 

if ( combination! > 4.3 || combination > 0.08 ) 
stage4 98_prob_hist [11] is incremented. 

if ( combination! < 1.6 |! combination < -0.55 ) 
stage4 98_prob_hist [0] is incremented. 

If the feature combination values are within 
the acceptable ranges, the objects confidence is 
recorded in a histogram with the following bin 

ranges : 



20 



Confidence 


Bin Confidence Range 


1 


0.000 - 


< 0.500 


2 


0.500 - 


< 0.600 


3 


0.600 - 


< 0.700 


4 


0.700 - 


< 0.750 


5 


0.750 - 


- < 0.800 


6 


0.800 - 


- < 0.850 


7 


0 .850 


- < 0.900 


8 


0 . 900 


- < 0.950 


9 


0 . 950 


- < 0.975 


10 


0.975 


- 1.000 



25 



Figure 9 illustrates how the confidence is 
computed for the ploidy classifier 100. The' 
classification process is described in the object 
classification 14 Ploidy 100 section of thxs 
document. The object is classified at step 222. If 
the object is classified as abnormal, "yes" 221, by 
the first classifier that uses the feature 
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combination 1 step 220, the probability is computed 
in step 224 described below and prob2 is set to 1.0 
at step 226. The object is then sent to the second 
classifier. At step 230, if the object was 
classified as abnormal, "yes" 231, by the second 
classifier that uses the feature combination 2 step 
228, the probability is computed for that classifier 
at step 232, and the final confidence is computed as 
the product of the first and second probabilities in 
step 234. If the object is classified as normal by 
either the first or the second classifier, no 
confidence is reported for the object. 

To determine the confidence of the 
classification results in the ploidy classifier 100, 
the mean and standard deviations of the linear 
combinations of the normal and abnormal populations 
were calculated from the training data. These 
calculations were done for the feature combination 1 
step 220 and the feature combination 2 step 228. 
The results are shown in the following table: 





The feature 

combination 1 
step 220 


The feature 
combination 2 
step 228 


Normal/Artif act mean 


2.55 


- 0.258 


Normal/Artif act ud 


0.34B 


0.084 


Abnormal ma an 


2.80 


-0.207 


Abnormal ad 


0.403 


0.095 



Using the means and standard deviations 
calculated, the normal and abnormal likelihoods are 
computed for the feature combination 1 step 220: 



BNSDOCID: <WO 9609605A1_L> 



WO 96/09605 



PCT/US95/11492 



70 



(o bject value - norm_popjneanf 
normaljikelihood = Norm_pop_sd 
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(o biect_value - abnormjpopjneanf 
abnormaljikelikood = — a bnorm _pop_sd 

Compute the likelihood ratio as: 

likelihoodjratio « 
norm_poP- ^^.Siabnormjikelihood - normjikelihood)]) 
abnormjpopjsd 

Normalize the ratio: 

likelihood_ratio 
probl ~ 1 + likelihoodjratio 

If it goes to Step2, compute the normalized 
likelihood ratio as described above using the means 
and standard deviations from the second feature 
combination. This value will be P rob2 The 
confidence value of an object classified as abnormal 
by the ploidy classifier 100 is the product of probl 
and prob2, and should range from 0.0 to 1.0 in 
value. The confidence value is recorded m a 

histogram. . 

The confidence histogram has 12 bins. Bin[0] 
and Bin [HI are reserved for special cases. If the 
values computed for combination 1 or combination 2 
fall near the boundaries of the values existing m 
the training set, then a confident 
decision cannot be made about the object. If the 
feature combination value of the object is at the 

high end of the boundary, increment bin [111 by 1. 

If the feature combination value is at the low end, 

increment bintO] by 1. The decision rules for these 

cases are stated as follows. 
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if ( combinationl < -0.60 || combination2 < -0.30 ) 
sil_ploidy_prob_hist [0] is incremented. 

if ( combinationl > 0.35 |j combination2 > 1.60 ) 
sil_j>loidy_prob_hist [11] is incremented. 

If the feature combination values are within 
the acceptable ranges, the objects confidence is 
recorded in a histogram with the following bin 
ranges : 

Confidence Bin Confidence Range 



1 


0.000 


- < 


0.500 


2 


0 .500 


- < 


0 . 600 


3 


0 .600 


- < 


0.700 


4 


0 .700 


- < 


0.750 


5 


0.750 


- < 


0 . BOO 


6 


0.800 


- < 


0 .850 


7 


0.850 


- < 


0 . 900 


8 


0.900 


- < 


0 .950 


9 


0.950 


- < 


0.975 


10 


0.975 


- 1 


.000 



20 IOD Histograms 

When objects are classified as alarms, it is 
useful to know their density. Abnormal cells often 
have an excess of nuclear materials, causing them to 
stain more darkly. Comparing the staining of the 
25 alarms to the staining of the intermediate cells may 
help determine the accuracy of the alarms. 
Stage2 94 

Each object classified as an abnormal cell by 
the Stage2 94 classifier is counted in the alarm IOD 
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10 



15 



20 



histogram. The ranges of the bins are shown in the 
following table : 

IOD Bin Range of Integrated Optical 
Densities per Bin _____ 



U 


0 - 11,999 


1 


12, 000 


- 13,000 


2 


14, 000 


- 15,999 




16, 000 


- 17,999 


4 


18,000 


- 19,999 


5 


. 20, 000 


- 21,999 


6 


22,000 


- 23,999 


7 


24 , 000 


- 25,999 


8 


26,000 


- 27,999 


9 


28, 000 


- 29,999 


10 


30,000 


- 31,999 


11 


32,000 


- 33,999 


12 


34,000 


- 35,999 


13 


36, 000 


- 37,999 


14 


38, 000 


- 39,999 


15 


40 


,000 + 



Stage3 

The stage3 96 alarm IOD histogram is the same 
format as the Stage2 94 histogram. It represents 
the IOD of each object classified as an abnormal 
25 object by the stage3 96 classifier. 
Contextual Alarm Measurements 

Abnormal objects tend to form clusters, so it 
is useful to measure how many alarmed objects are 
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close to other alarmed objects. Specifically, the 
following contextual measurements are made: 

o Contextual Stage2 94 alarm: the number of 

Stagel 94 alarms that are close to a Stage2 94 
5 alarm 

o Contextual Stage3 96 alarm: the number of 

Stage2 94 alarms that are close to a stage3 96 
alarm 

The distance between alarm objects is the Euclidean 
10 distance: 

V^Ax 2 + Ay 2 

If a stage3 96- alarm is contained in an image, the 
distance between it and any Stage2 94 alarms is 
measured. If any are within a distance of 200, they 
are considered close and are counted in the cluster2 

15 feature. This features value is the number of 

Stage2 94 alarms found close to stage3 96 alarms. 
The same applies to Stagel alarms found close to 
Stage2 94 alarms for the clusterl feature. 

Each object that is close to a higher alarm 

20 object is counted only once. For example, if a 

Stage2 94 alarm is close to two stage3 96 alarms, 
the value of clusterl will be only 1. 
Estimated Cell Count 

The results of the Stagel classification are 

25 used to estimate the number of squamous cells on the 
slide. 

If we define the following variables, 
norm = sil_ stagel_normal__countl 
abn = sil_stagel_abnormal_countl 
30 art = sil_stagel_artif act__countl 
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the estimated cell count is then computed according 
to this formula: 

Est_CC = 0.91 + 1-44 (norm ) - 0.75 ( abn ) + 0.26 
( art ) 

0.0021 ( norm 2 ) + 0.083 < abn 2 ) - 0.0013 
( art 2 ) 

0.015 ( norm 2 ) - 0.043 ( norm * abn ) - 
0.016 ( art * abn) + 0.0016 ( norm * art * 
abn ) 

Process performance has been tracked and 
validated throughout all stages of classification 
training. A cross validation method was adapted for 
performance tracking at each stage, in which 
training data is randomly divided into five equal 
15 sets. A classifier is then trained by four of the 

five sets and tested on the remaining set. Sets are 
rotated and the process is repeated until every 
combination of four sets has been used for testing: 



10 



20 



25 



T raining data 
sets 1, 2, 3 & 4 
sets 2, 3, 4 St 5 
sets 3, 4, 5 & 1 
sets 4, 5, 1 & 2 
sets 5, 1, 2, & 3 



Tost set 

5 
1 
2 
3 
4 



The classification merit (CM) gain is used to 
measure the performance of the apparatus of the 
inventions at each stage. 

where Sensitivity is the percentage of abnormal 
cells correctly classified as abnormal, FPR xs the 
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Sensitivity 

CM = — — — 

FPR 



false positive rate, or the percentage of normal 
cells and artifacts incorrectly classified as 
abnormal cells. 

The objects that were classified as abnormal in 
5 the previous stage continue to a further stage of 
classification. This stage will refine the 
classification produced by the previous stage, 
eliminating objects that were incorrectly classified 
as abnormal. This increases the CM gain. The goal 
10 for the apparatus of the invention is CM gain=200. 
CM Calculation Example: 

A typical normal slide might contain 1,000 
significant objects that are normal cells. The goal 
for the artifact retention rate is 0.2% 
15 a low prevalence abnormal slide might contain 

the same number of normal cells, along with ten 
significant single abnormal cells. Of the abnormal 
slide's ten significant abnormal objects, it is 
expected that the 4x process can select five objects 
20 for processing by the invention. Object 

classification 14 that has a 40% abnormal cell 
sensitivity reduces this number to 2. (5x40% = 2) . 

CM = — = 200 
0.20 



For process performance, the CM gain is 
expected to fall within the range of 200 ± 10, and 
25 sensitivity is expected to be within the bounds of 
40 ± 10. Results of cross validated testing for 
each stage are illustrated in Table 5.1, which shows 
overall CM gain of 192.63 and overall sensitivity of 
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32.4%, each of which fall within the range of our 
goal . 

The invention Feature Descriptions 

This section contains names and descriptions of 
all features that can be used for object 
classification 14. Not all features are used by the 
object classification 14 process. Those features 
that are used by the invention are listed in feature 
sets . 

The feature names are taken from the 
TwentyXFeatures_s structure in the AutoPap® 300 
software implementation. 

Items shown in bold face are general 
descriptions that explain a set of features. Many 
15 features are variations of similar measures, so an 
explanation block may precede a section of similar 
features . 
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Type Feature Description 

int label_cc: A unique numeric label assigned 

to each segmented object. The object in the upper- 
left corner is assigned a value of 1. The remaining 
5 object are labeled 2, 3 # etc. from left to right 
and top to bottom. 

int xO: Upper left x coord, of the corner of 

the box which contains the object region of 
interest . 

10 int y0: Upper left y coord, of the corner of 

the box which contains the object region of 
interest . 

xl: Lower right x coord, of the corner of 
the box which contains the object region of 
15 interest . 

int yl: Lower right y coord, of the corner of 

the box which contains the object region of 
interest . 

float area: Number of pixels contained in the 

20 labeled region. 

float Bch: A measure of shape defined as: x = xl 

-xO+ly - yl - yO+lsch = 100 * abs(x-y) / (x + y) 

float sbx: A measure of shape defined as: x = xl 

- x0+l y = yl - y0+l sbx = 10 * x * y / area 

25 int stagelJLabel: The classification label 

assigned to the object by the stagel classifier. 
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int stage2 94_label: The classification label 

assigned to the object by the stage2 94 classifier. 

int stages 96_label: The classification label 

assigned to the object by the stage3 96 classifier. 

float aroa2: Same feature as area except the 

area of interest (labeled region) is first eroded by 
a 3x3 element (1-pixel) . 

float area_inner_edge: Number of pixels in the 

erosion residue using a 5x5 element on the labeled 
10 image (2 -pixel inner band) . 

float area_outer_edge: Number of pixels in the 

5x5 dilation residue minus a 5x5 closing of the 
labeled image (approx. 2-pixel outer band) . 

float auto_mean_dif£_orig2: au to thresh_orig2 - 

15 mean_prig2. 

float auto_mean_dif£_enh2 : a utothresh_enh2 - 

mean_enh2 . 

float autothre B h_enh: These features are 

computed in the same way as autothresh_orig except 
. the enhanced image is used instead of the original 



20 



image . 



float a utothreBh_enh2s These features are 

computed in the same way as autothresh.orig* except 
the enhanced image is used instead of the or.gxnal 
25 image. 

float autothresh_orig: This computation is based 
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on the assumption that original image gray scale 
values within the nuclear mask are bimodally 
distributed. This feature is the threshold that 
maximizes the value of "variance-b" given in 
5 equation 18 in the paper by N. Otsu titled "A 
threshold selection method from gray- level 
histograms", IEEE trans. on systems, man. and 
cybernetics, vol. smc-9, no. 1 January, 1979. 

float autothresh_orig2 : The same measurement 

10 except gray scale values are considered within a 
nuclear mask that has first been eroded by a 3x3 
element (1-pixel) ) . 

float below_autothresh_enh2: (count of pixels < 

autothresh_enh2) / area2 

15 float below_autothresh_orig2 : (count of pixels < 

autothresh_origr2) / area2 

float compactness: perimeter * perimeter / area 

float compactness2 : perimeter2 * perimeter2 / 

area 

20 float compactnese_alt : perimeter2 / nuclear_max 
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Type Feature 

Condensed 

For the condensed features, condensed pixels are 
those whose optical density value is: 

> ftCondensedThreehold *mean_od . 
ftCondensedThreshold is a global floating point 
variable that can be modified (default is 1.2). 
float condenBedjercent: Sum. of the condensed 

pixels divided by the total object area. 

float condensed_area_jpercent: The number of 

condensed pixels divided by the total object area. 

float condexxsed_ratio: Average optical density 

values of the condensed pixels divided by the 
mean^od. 

15 float condensed_count: The number of components 

generated from a 4 -point connected components 
routine on the condensed pixels. 

float condensed_avg_area: The average area 

(pixel count) of all the of condensed components. 

20 float condensed.compactneBS: The total "umber of 

condensed component boundary pixels squared, dxv.ded 
by the total area of all the condensed components. 

float conden B ed_distance: The sum of the squared 

euclidean distance of each condensed pixel to the 
center of mass, divided by the area. 



float cytoplasmmax: The greatest distance 

transform value of the cytoplasm image within each 
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area of interest. This value is found by doing an 
8-connect distance transform of the cytoplasm image, 
and then finding the largest value within the 
nuclear mask* 

5 float cytoplasm_max_alt : The greatest distance 

transform value of the cytoplasm image within each 
area of interest . The area of interest for 
cytoplasmjmax is the labeled image while the area of 
interest of cytoplasmjmax_alt is the labeled regions 
10 generated from doing a skiz of the labeled image. 

float density_0_l: perimeter^out - perimeter 

float density_l_2: Difference between the '1' 

bin and '2' bin of the histogram described in 
perimeter. 

15 float density_2_3: Difference between the '2' 

bin and '3' bin of the histogram described in 
perimeter 

float density_3_4: Difference between the '3' 

bin and '4' bin of the histogram described in 
20 perimeter. 

float edge_contrast_orig: First a gray scale 

dilation is calculated on the original image using a 
5x5 structure element. The gray-scale residue is 
then computed by subtracting the original image from 
25 the dilation . edge_contrast_orig is the mean of the 
residue in a 2-pixel outer ring minus the mean of 
the residue in a 2 -pixel inner ring (the ring refers 
to the area of interest see area_outer_edge) . 
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float int«grated_dena±ty_enh: Summation of all 

gray- scale valued pixels within an area of interest 
(values taken from enhanced image) .Value is summed 
from the conditional histogram of image. 

float integrated_den B ity_enh2: The same 

measurement as the last one except the area of 
interest is first eroded by a 3x3 element (1- 

pixel) ) • 

float integrated_density_od: Summation of all 

gray- scaled valued pixels within an area of interest 
(values taken from the od image) - The od (optical 
density) image is generated in this routine using 
the feature processor to do a look-up table 
operation. The table of values used can be found rn 
15 the file fov_features.c initialized in the static 
int array OdLut. 

float ixxtegrated_denBitv_od2: The same 

m easurement as the last one except the area of 
interest is first eroded by a 3x3 element (1-pxxel) . 

20 float integrated_den 8 ity_orig: Summation of all 

gray-scale valued pixels within an area of interest 
(values taken from original image) .Value is summed 
from the conditional histogram of image. 

float integrated_density_orig2: The same 

25 measurement as the last one except the area of 

first eroded by a 3x3 element (1-pixel) 



interest is 



float mean_background: Calculates the average 
gray-scale value for pixels not on the cytoplasm 



mask, 
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float mean_enh: Mean of the gray-scale valued 

pixels within an area of interest .Calculated 
simultaneously with integrra ted_densi ty_enh from the 
enhanced image . 

float mean_enh2: The same measurement as the 

last one except the area of interest is first eroded 
by a 3x3 element (1 -pixel) . 

float mean_od: The mean of gray- scale values in 

the od image within the nuclear mask. 

float mean_od2: The same measurement as the last 

one except the area of interest is first eroded by a 
3x3 element (1-pixel) . 

float mean_orig: Mean of gray-scale valued 

pixels within an area of interest . Calculated 
15 simultaneously with integrated_density_orig from the 
original image. 

float mean_orig2 : The same measurement as 

mean_orig except the area of interest is first 
eroded by a 3x3 element (1 -pixel) . 

20 float mean_outer_od: The mean of the optical 

density image is found in an area produced by 
finding a 5x5 dilation residue minus a 5x5 closing 
of the nuclear mask (2-pixel border) . 

float normalized_integrated_od: First subtract 

25 mean_outer_od from each gray-scale value in the od 
image. This produces the "reduced values". Next 
find the sum of these reduced values in the area of 
the nuclear mask. 
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float normali Z ed_integrated_od2: The same 

summation described with the last feature computed 
in the area of the nuclear mask eroded by a 3x3 
element (1 -pixel) . 

5 float normali 2 ed_mean_od: Computed with the 

reduced values formed during the calculation of 
normalized_integrated_od : find the mean of the 
reduced values in the nuclear mask. 

float nonnalized_mean_od2: Same calculation as 

10 normalized_mean_od, except the nuclear mask is first 
eroded by a 3x3 structure element (1 -pixel) . 

float nc_contrast_orig: Mean of gray-values in 
outer ring minus mean_orig2. 

float nc_BCore: Nuclear-cytoplasm ratio. nc_score 
15 = nuclear jnax / cytoplasinjnax. 

float nc_BCore_alt: Nuclear- cytoplasm 

ratio.nc_score_alt = nuclearjnax / cy topi a sm_wax_a It 

float nuclearjnax: The greatest 4 -connect 

distance transform value within each labeled region 
20 This is calculated simultaneously with perimeter and 
compactness using the distance transform image. 

float perimeter: A very close approximation to 

the perimeter of a labeled region. It is calculated 
by doing a 4-connect distance transform, and then a 
25 conditional histogram. The bin of each 

histogram is used as the perimeter value. 
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float perimeter_out: The "outside" perimeter of 

a labeled region. It is calculated by doing a 
dilation residue of the labeled frame using a 3x3 
(1 -pixel) element followed by a histogram. 

5 float perimeter2: The average of perimeter and 

perimeter_out. 

float region_dy_range_enh: The bounding box or 

the region of interest is divided into a 3x3 grid (9 
elements) . If either side of the bounding box is 

10 not evenly divisible by 3, then either the dimension 
of the center grid or the 2 outer. grids are 
increased by one so that there are an integral 
number of pixels in each grid space. A mean is 
computed for the enhanced image in the area in 

15 common between the nuclear mask and each grid space. 
The region's dynamic range is the maximum of the 
means for each region minus the minimum of the means 
for each region. 

float sd_dif ference: Difference of the two 

20 standard deviations . sd_dif ference = sd_orig - 
sd_enh. 

float sd_exih: Standard deviation of pixels in an 

area of interest. Calculated simultaneously with 
Integra ted_density_enh from the enhanced image. 

25 float sd_enh2: The same measurement sd_enh 

except the area of interest is first eroded by a 3x3 
element (1-pixel) ) . 

float sd_orig: Standard deviation of pixels in 

an area of interest. Calculated simultaneously with 
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Integra ted_density_orig from the original image. 

float sd_orig2: The same measurement as sd_orxg 

one except the area of interest is first eroded by a 
3x3 element (1-pixel)). 

float shape_Bcore: Using the 3x3 gridded regions 

described in the calculation of region_dy_range_enh, 
the mean grayscale value of pixels in the object 
mask in each grid is found. Four quantities are 
computed from those mean values: H, V, Lr, and Rl. 

For H: Three values are computed as the sum of 
the means for each row. H is then the maximum row 
value - minimum row value. 

For V: Same as for H, computed on the vertical 
columns of the grid. 

For Lr: One value is the sum of the means for 
the diagonal running from the top left to the bottom 
right. The other two values are computed as the sum 
of the three means on either side of this diagonal. 
The value of Lr is the maximum - minimum value for 
the three regions. 

For Rl: Same as Lr, except that the diagonal 
runs from bottom-left to top-right. 

Shape_Score = Jv 2 *h 2 +Lr 2 *Rl* 

float perim_out_r3 : The "outside" perimeter of a 

labeled region determined by doing a 4-connect 
distance transform of the labeled image. The number 
of 'l's in each mask are counted to become this 
value . 
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£ float nc_score_r3: The average value of the 8- 

connect distance transform of the cytoplasm mask is 
found inside the 3x3 dilation residue of the nuclear 
mask. Call this value X. The feature is then: 
5 nuclear_jnax/ (X + nuclear_max) . 

float nc_score_alt_r3 : Using "X" as defined in 

nc_score_r3, the feature is: area/ (3 . 14*X*X) . 

float nc_score_r4: The median value of the 8- 

connect distance transform of the cytoplasm mask is 
10 found inside the 3x3 dilation residue of the nuclear 
mask. This value is always an integer since the 
discrete probability density process always crosses 
0.5 at the integer values. Call this value Y. The 
feature is then: nuclear_max/ (Y + nuclearjnax) . 




float nc score_alt_r4 : Using "Y" as defined in 



nc_score_r4 , the feature is: area/ (3 . 14*Y*Y) . 

float mean_outer_od_r3 : The mean value of the 

optical density image in a 9x9 (4 pixel) dilation 
residue minus a 9x9 closing of the nuclear mask. 
. The top and bottom 20% of the histogram are not used 
in the calculation. 

float normalized_mean_od_r3 : As in 

normal xzed_mean_od except that the values are 
reduced by mean_outer_od__r3 . 

float n ormalized_integrated_od_r3: As in 

normalized_integrated_od except that the values are 
reduced by mean_outer_od_ r3 . 

float edge_density_r3: A gray-scale dilation 
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residue is performed on the original image using a 
3x3 element. The feature is the number of pixels > 
10 that lie in the 5x5 erosion of the nuclear mask. 

Texture 

5 in the following texture features, two global 

variables can be modified to adjust their 
calculation, f tOccurranceDelta is an integer 
specifying the distance between the middle threshold 
(mean) and the low threshold, and the middle (mean) 
10 and the high threshold, f tOccurranceOf f set is an 
integer specifying the number of pixels to "look 
ahead" or "look down". 

To do texture analysis on adjacent pixels, this 
number must be 1. To compute the texture features 
the«S" or -co-occurrence matrix- is first defined. 
To compute this matrix, the original image is first 
thresholded into 4 sets. Currently the thresholds 
to determine these four sets are as follows, where M 
is the mean.orig: x = 1 if x<M-20, x=2 if M-20<=x<M. 
20 x=3 if M<= x <M + 20, x=4 if x >=M + 20. The co- 
occurrence matrix is computed by finding the number 
of transitions between values in the four sets in a 
certain direction. Since there are four sets the 
co-occurrence matrix is 4x4. As an example consider 
25 a pixel of value 1 and its nearest neighbor to the 
right which also has the same value. For this 
pixel, the co-occurrence matrix for transitions to 
the right would therefore increment in the first 
row-column. Since pixels outside the nuclear mask 
30 are not analyzed transitions are not recorded for 

the pixels on the edge. Finally, after finding the 
number of transitions for each type in the co- 
occurrence matrix each entry is normalized by the 
total number of transitions. texture.correlation 
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and texture_inertia are computed for four 
directions: east, southeast, south, and southwest. 

float texture_correlation: The correlation 

process calculation is described on page 187 of 
5 Computer Vision, written by Ballard & Brown, 

Prentice -Hall, 1982. Options 2,3,4 indicate the 
same analysis, except that instead of occurring in 
the East direction it occurs in the Southeast, South 
or Southwest direction . 

10 float texture_inertia: Also described in 

Computer Vision, id. . 

float texture_range: The difference between the 

maximum and minimum gray- scale value in the original 
image . 

15 float texture_correlation2 : As above, direction 

southeast . 

float texture_inertia2 : As above, direction 

southeast . 

float texture_range2 : As above, direction 

20 southeast. 

float texture_correlation3: As above, direction 
south. 

float texture_inertia3 : As above, direction 
south . 



25 



float texture_range3 : As above, direction south. 
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30 



float 

southwest 



texture correlation : As above, direction 



float texture_inertia4: As above, direction 

southwest . 

float texture_range4: As above, direction 

southwest . 

COOC 

In the following features utilizing the "co- 
occurrence" or -S" matrix, the matrix is derived 
from the optical density image. To compute this 
matrix, the optical density image is first 
thresholded into six sets evenly divided between the 
.naximum and minimum OD value of the cell's nucleus 
in question. The S or "co-occurrence matrix" is 
computed by finding the number of transitions 
between values in the six sets in a certain 
direction. Since we have six sets, the co- 
occurrence matrix is 6x6. As an example, consider a 
pixel of value 1 and its nearest neighbor to the 
right, which also has the same value. For this 
pixel, the co-occurrence matrix for transitions to 
the right would increment in the first row-column. 
Since pixels outside the nuclear mask are not 
analyzed, transitions are not recorded for the 
pixels on the edge. Finally, after finding the 
number of transitions for each type in the co- 
occurrence matrix, each entry is normalized by the 
total number of transitions. The suffixes on these 
features indicate the position the neighbor is 
compared against. They are as follows: _1_0 : one 
pixel to the east. _2_0: two pixels to the east. 
4 0- four pixels to the east. _1_45: one pixel 
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to the southeast. _1_ 90: one pixel to the south. 
_1_135: one pixel to the southwest. 

float cooc_energy_l_0 : The square root of the 

energy process described in Computer Vision, id... 
5 Refer to the COOC description above for an 
explanation of the 1_0 suffix. 

float cooc_energy_2_0: Refer to the COOC 

description above for an explanation of the 2_0 
suffix. 

10 float cooc_energy_4_0 : Refer to the COOC 

description above for an explanation of the 4_0 
suffix. 

float cooc_energy_l_45 : Refer to the COOC 

description above for an explanation of the 1_45 
15 suffix. 

float cooc_ energy_l_90: Refer to the COOC 

description above for an explanation of the 1_90 
suffix. 

float cooc_energy_l_135: Refer to the COOC 

20 description above for an explanation of the 1_135 
suffix. 

float cooc_entropy_l_0 : The entropy process 

defined in Computer Vision, id.. Refer to the COOC 
description above for an explanation of the 1_0 
25 suffix. 

float cooc_entropy_2_0 : Refer to the COOC 

description above for an explanation of the 2_0 
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suffix. 

float cooc_entropy_4_0: Refer to the COOC 

description above for an explanation of the 4.0 
suffix. 

float cooc_entropy_l_45: Refer to the COOC 
description above for an explanation of the l_4S 

suffix. 

float cooc_entropy_l_90: Refer to the COOC 

description above for an explanation of the 1_90 
10 suffix. 

float cooc_entro P y_l_135: Refer to the COOC 

description above for an explanation of the 1_135 
suffix- 

float cooc_inertia_l_0: The inertia process 

15 defined in Computer Vision, id. . 

float cooc_inertia_2_0: Refer to the COOC 

description above for an explanation of the 2_0 
suffix. 

float cooc_inertia_4_0: Refer to the COOC 

description above for an explanation of the 4_0 



20 



suffix. 



float cooc_inertia_l_45: Refer to the COOC 

description above for an explanation of the 1_45 



suffix, 



25 



♦.^ i 90- Refer to the COOC 
float cooc_mertiaJL_9U. Keiei 

^ an exDlanation of the 1__90 
description above for an expxan 
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suffix. 

float cooc_inertia_lJL3 5: Refer to the COOC 

description above for an explanation of the 1_135 
suffix. 

5 float cooc - homo_l - 0 : The homogeneity process 

described in Computer Vision, id. . Refer to the 
COOC description above for an explanation of the 1_0 
suffix. 

float cooc_homo_2_0 : Refer to the COOC 

10 description above for an explanation of the 2_0 
suffix. 

float cooc_homo_4_0 : Refer to the COOC 

description above for an explanation of the 4_0 
suffix. 

15 float cooc_homo_l_45 : Refer to the COOC 

description above for an explanation of the 1_45 
suffix. 

float cooc_homo_l_90: Refer to the COOC 

description above for an explanation of the 1_90 
20 suffix. 

float cooc_homo_l_135: Refer to the COOC 

description above for an explanation of the 1__13 5 
suffix. 

float cooc__corr_lj: The correlation process 

25 described in Computer Vision, id. . Refer to the 

COOC description above for an explanation of the 1_0 
suffix. 
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float cooc.corrjj: Refer to the COOC 

description above for an explanation of the 2_0 
suffix. 

float cooc_corr_4_0: Refer to the COOC 

description above for an explanation of the 4_0 
suffix. 

float cooc_corr_l_45: Refer to the COOC 

description above for an explanation of the 1_45 
suffix. 

float cooc_corr_l_90: Refer to the COOC 

description above for an explanation of the 1_90 
suffix. 

float cooc_corr_l_135: Refer to the COOC 

description above for an explanation of the 1_135 
15 suffix. 

Run Length 

The next five features are computed using run 

length features. Similar to the co-occurrence 

features, the optical density image is first 

threshold into six sets evenly divided between the 

maximum and minimum OD value of the cell's nucleus 

in question. The run length matrix is then computed 

from the lengths and orientations of linearly 

connected pixels of identical gray levels. For 

example, the upper left corner of the matrix would 

count the number of pixels of gray level 0 with no 

horizontally adjacent pixels of the same gray value. 

The entry to the right of the upper left corner 

v, of nixels of gray level 0 with one 
counts the number of pixels y y 

n art-iarMit pixel of the same gray level, 
horizontally adjacent p^ Ci 



20 



25 



30 



BNSOOCID: <W0 96O9605A1_L> 



WO 96/09605 



PCTAJS95/11492 



10 



- 95 - 

float emphasiB_short: The number of runs divided 

by the length of the run squared: 



# gray # runs 

E E 

i-i y-i J 



2 



p(i,j) is the number of runs with gray level i 
and length j. This feature emphasizes short runs, 
or high texture . 

float emphasi s_long: The product of the number 

of runs and the run length squared: 

* gray # runs 

E E > 2 

i-i y-i 

p(i,j) is the number of runs with gray level i 
and length j. This feature emphasizes long runs, or 
low texture. 

float nonuniform_gray: The square of the number 

of runs for each gray level: 



# gray 

E 

i-i 



# runs 



E p&fi 



The process is at a minimum when the runs are 
equally distributed among gray levels. 



15 float nonuniform_run: The square of the number 

of runs for each run length: 



# runs 

E 



'# gray 



E p&fl 



i-i 



This process is at its minimum when the runs 
are equally distributed in length. 
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float percentage^: The ratio of the total 

number of runs to the number of pixels in the 
nuclear mask: 

# gray # runs 

# pixels 

This feature has a low value when the structure 
of the object is highly linear. 

float inertia_2_min_axis: Minimum axis of the 

2nd moment of inertia of the nuclear region 
normalized by the area in pixels. 

float inertia_2_max_axis: Maximum axis of the 

2nd moment of inertia of the nuclear region 
normalized by the area in pixels. 

float inertia_2_ratio: inertia_2_min_axis I 

inertia_2_max_axis . 

float max_od: Maximum optical density value 

15 contained in the nuclear region. 

float min_od: Minimum optical density value 

contained in the nuclear region. 

float sd_od: Standard deviation of the optical 

density values in the nuclear region. 



10 



20 



float cell_free_lying: This feature can take on 

two values: 0.0 and 1.0 (1.0 indicates the nucleus 
is free lying) .To determine if a cell is free lying, 
a connected components is done on the cytoplasm 
image, filtering out any components smaller than 400 
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pixels and larger in size than the integer variable 
AlgFreeLyingCytoMax (default is 20000) .If only one 
nucleus bounding box falls inside the bounding box 
of a labeled cytoplasm, the nucleus (cell) will be 
5 labeled free lying (1.0), else the nucleus will be 
labeled 0.0. 

float cell_8em±_±solated: This feature can take 

on two values: 0.0 and 1.0 (1.0 indicates the nucleus 
is semi -isolated) . A nucleus is determined to be 

10 semi -isolated when the center of its bounding box is 
a minimum euclidean pixel distance from all other 
nuclei (center of their bounding boxes) . The 
minimum distance that is used as a threshold is 
stored in the global floating-point variable 

15 AlgSemilsolatedDistanceMin on the FOV card (default 
is 50.0). Only nuclei with the cc. active field non- 
zero will be used in distance comparisons; non- 
active cells will be ignored entirely. 

float cell_cyto_area: If the cell has been 

20 determined to be free-lying (cell_free__lying= 1.0), 
this number represents the number of pixels in the 
cytoplasm (value is approximated due to earlier 
downsampling) . If the cell is not free-lying, this 
number is 0.0. 

25 float cell_nc_ratio: If the cell has been 

determined to be free-lying (cell_free_lying= 1.0), 
this number is cc.area/ eel l_cyto_area. If the cell 
is not free-lying, this number is 0.0. 

float cell_centroid_diff : This feature is used 

30 on free-lying cells. The centroid of the cytoplasm 
is calculated, and the centroid of the nucleus. The 
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feature value is the difference between these two 
centroids . 

Local Area Context Normalization Features 

The original image nucleus is assumed to 
5 contain information not only about the nucleus, but 
also about background matter. The gray level 
recorded at each pixel of the nucleus will be a 
summation of the optical density of all matter in 
the vertical column that contains the particular 
10 nucleus pixel. In other words, if the nucleus is 

located in a cytoplasm which itself is located in a 
mucus stream, the gray level values of the nucleus 
will reflect not only the nuclear matter, but also 
the cytoplasm and mucus in which the nucleus lies. 
15 To try to measure features of the nucleus without 
influence of the surroundings and to measure the 
nucleus surroundings, two regions have been defined 
around the nucleus. Two regions have been defined 
because of a lack of information about how much area 
around the nucleus is enough to identify what is 
happening in proximity to the nucleus. 

The two regions are rings around each nucleus. 
The first ring expands 5 pixels out from the nucleus 
(box 7x7 and diamond 4) and is designated as the 
-small" ring. The second region expands 15 pixels 
out from the nucleus (box 15x15 and diamond 9) and 
is called the "big" ring. 

float smbright: Average intensity of the pixels 

in the small ring as measured in the original image. 

30 float big_bright: Average intensity of the 

pixels in the big ring as measured in the original 

image . 



20 



25 
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float mic_bright_sm: Average intensity of the 

nuclear pixels divided by the average intensity of 
the pixels in the big ring. 

float nuc_br±ght_b±g: Average intensity of the 

5 nuclear pixels divided by the average intensity of 
the pixels in the small ring. 

3x3 

The original image is subtracted from a 3x3 
closed version of the original. The resultant image 
10 is the 3x3 closing residue of the original. This 
residue gives some indication as to how many dark 
objects smaller than a 3x3 area exist in the given 
region. 

float sm_edge_3_3 : Average intensity of the 3x3 

15 closing residue in the small ring region. 

float big_edge_3_3 : Average intensity of the 3x3 

closing residue in the big ring region. 

float nuc_edge_3_3_sm: Average intensity of the 

3x3 closing residue in the nuclear region divided by 
20 the average intensity of the 3x3 closing residue in 
the small ring. 

float nuc_edge_3_3_big: Average intensity of the 

3x3 closing residue in the nuclear region divided by 
the average intensity of the 3x3 closing residue in 
25 the big ring. 

5x5 

The residue of a 5x5 closing of the original 
image is done similarly to the 3x3 closing residue 
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exce pt that the 3X3 closed image is subtracted fro. 
the 5X5 closed image instead of the or.gxnal. Thxs 
•relates those objects between 3x3 and 5x5 » -e. 

float Bxn_edge_5_5: Average intensity of the 5x5 

5 closing residue in the small ring region. 

float big edge_5_5: Average intensity of the 5x5 
closing residue in the big ring region. 

nuc edge 5 5 em: Average intensity of the 
5 r c losinrresidu; LTthe nuclear region divided by 
. 0 The avera/e intensity of the 5x5 closing residue » 
the small ring. 

float nuc edge S_S_big. Average intensity of the 
J iosing residue in the nuclear region divided by 
the average intensity of the 5x5 closing resrdue « 

15 the big ring. 

9X ' The residue of a 9x9 closing of the original 
image is done in the same way as the 5x5 closing 
rTsidue described ahove except the 5x5 clos.ng 
20 residue is subtracted fro. the 9x9 residue rather 
than the 3x3 closing residue. 

float sxn.edge 9_9 , Average intensity of the 9x9 

closing residue in the small ring region. 

float big edge_9_9: Average intensity of the 9x9 

25 closing residue in the big ring region. 

«ri ae 9 9 am: Average intensity of the 



pwcnnr.in- ^wn 8609605A1 I > 



WO 96/09605 



PCT/US95/11492 



- 101 - 

the average intensity of the 9x9 closing residue in 
the small ring. 

float nuc_edge_9_9_big: Average intensity of the 

9x9 closing residue in the nuclear region divided by 
5 the average intensity of the 9x9 closing residue in 
the big ring. 

2 Mag 

To find if an angular component exists as part 
of the object texture, closing residues are done in 
10 the area of interest using horizontal and vertical 
structuring elements. The information is combined 
as a magnitude and an angular disparity measure. 
The first structuring elements used are a 2x1 and 
1x2 . 

15 float nuc_edge_2_mag: Magnitude of 2x1 and 1x2 

closing residues within the nuclei. Square root of 
( (average horizontal residue) "2 + (average vertical 
residue) A 2 ) . 

float sm_edge_2_mag: Magnitude of 2x1 and 1x2 

20 closing residues within the small ring. Square root 
of ( (average horizontal residue) x 2 + (average 
vertical residue) A 2 ). 

float big_edge__2_iaag: Magnitude of 2x1 and 1x2 

closing residues within the big ring. Square root 
25 of ( (average horizontal residue) "2 + (average 
vertical residue) ^2 ). 

float nuc_edge_2_inag_em : nu c_edge_2__ma gr / 

sjm_edge_2_magr . 
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float nuc_edge_2_mag_big: auc_edge_2_mag I 

hi g_edge_2_mag . 

float nuc_edge_2_dir: Directional disparity of 
2x1 and 1x2 closing residues within the nuclei, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

float B m_edge_2_dir: Directional disparity of 

2x1 and 1x2 closing residues in the small ring, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

float big_edgo_2_dir: Directional disparity of 

2x1 and 1x2 closing residues in the big ring, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

15 float nuc_edge_2_dir_sm: nuc_edge_2_dir / 

sm_edge_2_dir . 

float nuc_edge_2_dir_big : nuc_edge_2_dir / 

bi g_edge_2__di r . 

5 Mag 

20 The structuring elements used are a 5x1 and a 

1x5. in this case, the residue is calculated with 
the 2x1 or 1x2 closed images rather than the 
original as for the 2x1 and 1x2 structuring elements 
described previously. 

25 float nuc_edge_5_mag: Magnitude of 5x1 and 1x5 

closing residues within the nuclei. Square root of 
( (average horizontal residue) A 2 + (average vertical 

residue) *2 ) . 
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float em_edge_5_mag : Magnitude of 5x1 and 1x5 

closing residues within the small ring. Square root 
of ( (average horizontal residue) A 2 + (average 
vertical residue) A 2 ). 

5 float big_edge_5_mag: Magnitude of 5x1 and 1x5 

closing residues within the big ring. Square root 
of ( (average horizontal residue) A 2 + (average 
vertical residue) *2 ). 

float nuc_edge_5_mag_sin: nuc_edge_5jnag / 

1 o sm_edge_5_mag 

float nuc_edge_5_mag_big: nuc_edge_5__/nag / 

hi g_edge_5_mag 

float nuc_edge_5_dir: Directional disparity of 

5x1 and 1x5 closing residues within the nuclei. 
15 (average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

float sm_edge_5_dir: Directional disparity of 

5x1 and 1x5 closing residues in the small ring, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 



20 



25 



float big_edge_5_dir: Directional disparity of 

5x1 and 1x5 closing residues in the big ring, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

float nuc_edge_5_dir_sm: nuc_edge_5_dir / 

sm_edgeJ5_dir 
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float nuc_edge_5_dir_bigs nuc_edge_S_dir I 

big_edge_5_dir 

9 Hag 

The last of the angular structuring elements 
5 used are a 9x1 and 1x9. In this case, the residue 
is calculated with the 5x1 or 1x5 closed images 
rather than the 2x1 and 1x2 structuring elements 
described for the 5x1 and 1x5 elements. 

float nuc_edge_9_mag: Magnitude of 9x1 and 1x9 

10 closing residues within the nuclei. Square root of 
( (average horizontal residue) *2 - (average vertical 
residue) *2 ) - 

float 8m_edge_9_mag: Magnitude of 9x1 and 1x9 

closing residues within the small ring. Square root 
15 of ( (average horizontal residue)^ + (average 

vertical residue) A 2 ). 

float big_edge_9_ m ag: Magnitude of 9x1 and 1x9 

closing residues within the big ring. Square root 
of ( (average horizontal residue) *2 - (average 
20 vertical residue) A 2 ). 

float nuc_edge_9„mag_sni: nuc_edge__9_mag / 

sm_edgeJ9_mag 

float nuc_edge_9_mag_big: nuc_edgre_9jnag / 

big_edge_9__mag 

25 float nuc edge_9_dir: Directional disparity of 

9X1 and 1X9 closing residues within the nuclei 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 
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float 8io_edge_9_dir: Directional disparity of 

9x1 and 1x9 closing residues in the small ring, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

5 float big_edge_9_dir: Directional disparity of 
9x1 and 1x9 closing residues in the big ring, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

float nuc_edge_9_dir_sm: nuc_edge_3_dir / 

10 sm_edge_9_dir 

float nuc_edge_9_dlr_big: nuc_edgre_9_dir / 

big_edge_9_dir 

Blur 

As another measure of texture, the original is 
15 blurred using a 5x5 binomial filter. A residue is 
created with the absolute magnitude differences 
between the original and the blurred image. 

float nuc_blur_aves Average of blur image over 

label mask. 

20 float nuc_blur_sd: Standard deviation of blur 

image over label mask. 

float nucj>lur_sk: skewness of blur image over 

label mask. 

float nucjDlurJcu: kurtosis of blur image over 

25 label mask. 
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float B m_blur_ave: Average of blur image over 

small ring. 

float sm_blur_sd: Standard deviation of blur 

image over small ring. 

5 float sm_blur_sk: Skewness of blur image over 

small ring. 

float em_blur_ku: Kurtosis of blur image over 

small ring. 

float big_blur_ave: Average of blur image over 

10 big ring. 

float big_blur_sd: Standard deviation of blur 

image over big ring. 

float big_blur_sk: Skewness of blur image over 

big ring. 

15 float big_blur_ku: Kurtosis of blur image over 

big ring. 

float nuo_blur_ave_ B ms Average of blur residue 

for the nuclei divided by the small ring. 

float nuc_blur_sd_ 8 m: Standard deviation of blur 

20 residue for the nuclei divided by the small ring. 

float nuc_blur_sk_sm: Skew of blur residue for 

the nuclei divided by the small ring. 

float nuc_blur_ave_big: Average of blur residue 

for the nuclei divided by the big ring. 
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float nuc_blur_sd_big: Standard deviation of 

blur residue for the nuclei divided by the big ring. 

float nuc_blur_ek_big: Skew of blur residue for 

the nuclei divided by the big ring. 

5 float mod_N_C_ratio: A ratio between the nuclear 

area and the cytoplasm area is calculated. The 
cytoplasm for each nuclei is determined by taking 
only the cytoplasm area that falls inside of a skiz 
boundary between all nuclei objects. The area of 
10 ' the cytoplasm is the number of cytoplasm pixels that 
are in the skiz area corresponding to the nuclei of 
interest. The edge of the image is treated as an 
object and therefore creates a skiz boundary. 

float mod_nuc_OD: The average optical density of 

15 the nuclei is calculated using floating point 
representations for each pixel optical density 
rather than the integer values as implemented in the 
first version. The optical density values are 
scaled so that a value of 1.2 is given for pixels of 
20 5 or fewer counts and a value of 0.05 for pixel 

values of 245 or greater. The pixel values between 
5 and 245 span the range logarithmically to meet 
each boundary condition. 

float mod_nuc_IOD : The summation of the optical 

25 density values for each pixel within the nuclei. 

float mod_nuc_OD_sm: The average optical density 

of the nuclei minus the average optical density of 
the small ring. 



8NSD0CID: <W0 9609605A1 J_> 



WO 96/09605 



PCTAJS95/11492 



- 108 



10 



float aod_nuc_OD_big: The average optical 
density of the nuclei minus the average optical 
density of the big ring. 

float mod_»uc_I0D_ B m: mod_nuc_OB_sm * number of 

pixels in the nuclei. Essentially, this is the 
integrated optical density of the nuclei normalized 
by The average optical density of the pixels within 
the small ring around the nuclei. 

float mod nuc_XOD_big: n,od_nuc_OD_bisr * number 
of Pixels in Ihe nuclei. Same as above, except t e 
average optical density in the big ring around the 
nuclei is used to normalized the data. 



15 



20 



25 



These features are the result of placing each 
pixel in the nuclear mask area in a histogram where 
Lh bin represents a range of optical densities. 
The numbers should be read as 1_2 = 1-2, °_ 
0.825. 

The original image is represented as 
transmission values. These values are -n^ted 
during the binning process to show equa size bins 
in terms of optical density which is a log 
translation of the transmission. The Histogram 
bins refer to the histogram of pixels of 
transmission values within the nuclear mask. 

float OD_bin_l_2: Sum Histogram bins #0 - 22 / 

Area of label mask. 

float OD_bin_l_125: Sum Histogram bins #13 / 

Area of label mask. 
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float OD_bin_l_05: Sum Histogram bins #23 - 26 / 

Area of label mask. 

float ODjbin_0_975: Sum Histogram bins #27 - 29 

/ Area of label mask. 

5 float OD_bin_0_9: Sum Histogram bins #3 0 - 34 / 

Area of label mask. 

float OD_bin_0_825: Sum Histogram bins #35 - 39 

/ Area of label mask. 

float OD_bin_0_75: Sum Histogram bins #40 - 45 / 

10 Area of label mask. 

float OD_b±n_0_6 75: Sum Histogram bins #4 6-53 

/ Area of label mask. 

float OD_bin_0_6: Sum Histogram bins #54 - 62 / 

Area of label mask. 

15 float OD_bin_0_525: Sum Histogram bins #63 - 73 

/ Area of label mask. 

float OD_bin__0_45: Sum Histogram bins #74 - 86 / 

Area of label mask. 

float OD_bin_0_375: Sum Histogram bins #87 - 101 

20 / Area of label mask. 

float OD_bin_0_3: Sum. Histogram bins #102 - 119 

/ Area of label mask. 

float ODjDin_0_225: Sum Histogram bins #120 - 

142 / Area of label mask. 
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float OD_bin_0_15: Sum Histogram bins #143 -187 

/ Area of label mask. 

float OD_bin_0_075: Sum Histogram bins #188 - 

255 / Area of label mask. 

5 float context_3a: systemFor this feature, the 

bounding box of the nucleus is expanded by 15 pixels 
on each side. The feature is the ratio of the area 
of other segmented objects which intersect the 
enlarged box to compactness of the box, where the 
10 compactness is defined as the perimeter of the box 
squared divided by the area of the box. 

float hole_percent: The segmentation is done in 

several steps. At an intermediate step, the nuclear 
mask contains holes which are later filled in to 
15 make the mask solid. This feature is the ratio of 
the area of the holes to the total area of the 
final, solid, mask. 

float context_lb: For this feature, the bounding 

box of the nucleus is expanded by 5 pixels on each 
20 side. The feature is the ratio of the area of other 
segmented objects which intersect the enlarged box 
to the total area of the enlarged box. 

float min_d± stance: The distance to the centroid 

of the nearest object from the centroid of the 
25 current object. 

The invention Results Descriptions 

This section shows all of the results of the 
invention that are written to the results structure 
TwentyXResult, which is contained in alh_tventyx.h. . 
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int h±gh_count: Measures dark edge gradient 

content of the whole original image. This is a 
measure of how much cellular material may be in the 
image . 

5 int high_mean: The average value of all pixels 

in an image that have values between 199 and 250. 
This feature provides some information about an 
image's background. 

int xnediunt_threshold: lower_limit_0 - 

10 lower_limit_l where lower_limi t_0 is the value of 
the low_threshold+30 , or 70, whichever is greater. 
lower_limit_l is the value of high_jnean - 40, or 
150, whichever is greater. 

int low_threshold: The low threshold value is 

15 the result of an adaptive threshold calculation for 
a certain range of pixel intensities in an image 
during the segmentation process. It gives a measure 
for how much dark matter there is in an image. If 
the threshold is low, there is a fair amount of dark 
20 matter in the image. If the threshold is high, 

there are probably few high density objects in the 
image . 

float timel: Time variables which may be set 

during the invention processing. 

25 float time2: Same as timel 

float time3 : Same as timel 

float time4: Same as timel 
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float Btain_aean_od: The cumulative value of 

mean_od for all objects identified as intermediate 
cells. 

float s tainBq_mean_od: The cumulative squared 

value of mean_od for all objects identified as 
intermediate cells. 

float stain_Bd_orig2: The cumulative value of 

sd_orig2 for all objects identified as intermediate 
cells. 

float S tainsq_Bd_orig2: The cumulative squared 

value of sd__orig2 for all objects identified as 
intermediate cells. 

float Bt ain_nc_contrast_orig: The cumulative 

value of nc_contrast_orig for all objects identified 
15 as intermediate cells. 

float Btainsq_nc_contra 8 t_orig: The cumulative 

squared value of nc_contrast_orig for all objects 
identified as intermediate cells. 

float stain_mean_outer_od_r3: The cumulative 

20 value of n,ean_outer_od_r3 for all objects identified 
as intermediate cells. 

float B tainsq_mean_outer_od_r3: The cumulative 

squared value of mean_.au ter_od_r3 for all objects 
identified as intermediate cells. 

25 float Btain_nuc_blur_ave: The cumulative value 

of nucjblur_ave for all objects identified as 

intermediate cells. 
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float stainsq_nuc_blur_ave: The cumulative 

squared value of nuc_blur_ave for all objects 
identified as intermediate cells. 

float stain_edge_contrast_orig: The cumulative 

5 value of edge_contrast_orig for all objects 
identified as intermediate cells. 

float stainsq_edge_contrast_orig: The cumulative 

squared value of edge_contrastjorig for all objects 
identified as intermediate cells. 

10 int intennediate_histl[10] [6] : Histogram 

representing the features of all intermediate cells 
identified by the first classifier. 10 bins for 
IOD, and 6 for nuclear area. 

int intermediate_hiet2 [8] [6] : Histogram 

15 representing the features of all intermediate cells 
identified by the second classifier. 8 bins for 
IOD, and 6 for nuclear area. 

int sil_boxl_artifact_count : Total number of 

objects in the image classified as artifacts by the 
20 Boxl classifier. 

int sil_box2_artifact_count: Total number of 

objects in the image classified as artifacts by the 
Box2 classifier. 

int sil__box3_artifact_count: Total number of 

25 objects in the image classified as artifacts by the 
first classifier of the Artifact Filter, 
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int s il_box4_artifact_count: Total number of 

objects in the image classified as artifacts by the 
second classifier of the Artifact Filter. 

int sil_box5_artifact_counts Total number of 

5 objects in the image classified as artifacts by the 
third classifier of the Artifact Filter. 

int conCompCount: The number of objects 

segmented in the image. 

int sil_stagel_nonnal_countl: Total number of 

10 objects classified as normal at the end of the 
Stagel classifier. 

int sil_stagel_artifact_countl: Total number 

of objects classified as artifact at the end of the 
Stagel classifier. 

15 int s ii_Btagel_abao«al_countl: Total number 

of objects classified as abnormal at the end of the 
Stagel classifier. 

int Bil_etage2_nonnal_eountlt Total number of 

objects classified as normal at the end of the 
20 Stage2 94 classifier. 

int 8 il_stage2_artifact_countl: Total number 

of objects classified as artifact at the end of the 
Stage2 94 classifier. 

int B il_ 8 tage2_abnormal_countl: Total number 

25 of objects classified as abnormal at the end of the 
Stage2 94 classifier. 
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int eil_stage3_norxaal_countl: Total number of 

objects classified as normal at the end of the 
stage3 96 classifier, 

int sil_stage3_artifact_countl: Total number 

5 of objects classified as artifact at the end of the 
stage3 96 classifier. 

int e il_stage3_jibnonnal_countl: Total number 

of objects classified as abnormal at the end of the 
stage3 96 classifier. 

10 i n t sil_cluster_stage2_count: The number of 

objects classified as abnormal by the Stage2 94 
classifier which are close to abnormal objects from 
the stage3 96 classifier. 

- int sil_cluster_stagel_count: The number of 

15 objects classified as abnormal by the Stagel 

classifier which are close to abnormal objects from 
the Stage2 94 classifier. 

float s il_est_cellcount: An estimate of the 

number of squamous cells in the image. 

20 int sil_stage2_alarm„IOD_histo[16] : Histogram 

representing the IOD of all objects classified as 
abnormal by the Stage2 94 classifier. 

int S il_stage2_alarm_conf_hiet[10] : Histogram 

representing the confidence of classification for 
25 all objects classified as abnormal by the Stage2 94 
classifier. 
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int .ii_.tag.3_*lar»_IOD_bi8to[16]* Histogram 

representing the I0D of all objects classified as 
abnormal by the stages 96 classifier. 

int Bi i_stage3_alarm_conf_hiBt[10]: Histogram 

5 representing the confidence of classification for 

all objects classified as abnormal by the stage3 96 
classifier. 

int sil_Btagel_normal_count2: Total number of 

objects classified as normal by the Stagel Box 

10 classifier. 

int Bi l_stagel_abnormal_count2: Total number 

of objects classified as abnormal by the Stagel Box 

classifier. 

int sil_atagel_artifact_count2: Total number 

15 of objects classified as artifact by the Stagel Box 

classifier . 

int sil_pl_Btage2_nor J nal_count2: Total number 

of objects classified as normal by the Stage2 94 Box 
classifier. 

20 int si i_ J pl_stage2_abnonnal_count2: Total 

number of objects classified as abnormal by the 
Stage2 94 Box classifier. 

int s il_pl_stage2_artifact_count2: Total 

number of objects classified as artifact by the 

25 Stage2 94 Box classifier. 

int si l_pl_stage3_normal_count2: Total number 

of objects classified as normal by the stage3 96 Box 
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classifier. 

int sil_pl_8tage3_abnormal_count2 : Total 

number of objects classified as abnormal by the 
stage3 96 Box classifier. 

5 int sil_pl_stage3_artifact_count2: Total 

number of objects classified as artifact by the 
stage3 96 Box classifier. 

int eil_Btage4_alarm_count: Total number of 

objects classified as abnormal by the stage4 98 
10 classifier. 

int sil_stage4_prob_hist [12] : Histogram 

representing the confidence of classification for 
all objects classified as abnormal by the stage4 98 
classifier. 

15 i n t sil_ploidy_alann_countl: Total number of 

objects classified as abnormal by the first ploidy 
classifier 100. 

int s il_j>loidy_alarm_count2 : Total number of 

objects classified as abnormal by the second ploidy 
20 classifier 100. 

int sil_ploidy_prob_hist [12] : Histogram 

representing the confidence of classification for 
all objects classified as abnormal by the ploidy 
classifier 100. 

25 int sil_S4_and_Pl__count: Total number of 

objects classified as abnormal by both the stage4 98 
and the first ploidy classifier 100. 
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B il S4_and_P2_count: Total number of 
objects classified abnormal by both the stage4 98 
and the second ploidy classifier 100. 

int atypical^pd£_iadexl8] [8] : A 2D histogram 

5 representing two confidence measures of the objects 
classified as abnormal by the Stage2 94 Box 
classifier. Refer to the description of the 
atypicality classifier in this document. 

int s il_Beg_x_s2_decisive[4Js A 4 bin 

10 histogram of the product of the segmentation 

robustness value and the Stage2 94 decisiveness 

value . 

int si l_seg_x_s3_deci B ive[4] : A 4 bin 

histogram of the product of the segmentation 

-i hhP staae3 96 decisiveness 

15 robustness value and the stages 

value . 

int sil_s2_x_s3_decisive[4l : A 4 bin histogram 

of the product of the Stage2 94 decisiveness value 
and the stage3 96 decisiveness value. 

20 int Bil seg_x_s2_x_B3_decisivet4] : A 4 bin 

histogram of the product of the segmentation 

i «-v, 0 cfaae2 94 decisiveness value, 
robustness value, the Stages 

the stages 96 decisiveness value. 

int sil s t age2 dec_x_seg [4] [4] : A 4x4 array of 

Stage2 94 decisiveness (vertical axis) vs. 
segmentation robustness (horizontal axis) . 

int sil Btage3_deo_x_seg[4] [4] : A 4x4 array of 

stageB 96 decisiveness (vertical axis) vs. 
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segmentation robustness (horizontal axis) • 

int sil_s3_x_B2_dec_x_seg[4] [4] : A 4x4 array 

of the product of Stage2 94 and stage3 96. 
decisiveness (vertical axis) vs. segmentation 
5 robustness (horizontal axis) . 

int sil_s3_x_segrobust_x_B2pc [4] [4] : A 4x4 

array of the product of segmentation robustness and 
stage3 96 decisiveness (vertical axis) vs. the 
product of Stage2 94 confidence and Stage2 94 
10 decisiveness (horizontal axis) . 

int sil_s3_x_eegrobust_x_s3pc [4] [4] : A 4x4 

array of the product of segmentation robustness and 
stage3 96 decisiveness (vertical axis) vs. the 
product of stage3 96 confidence and stage3 96 
15 decisiveness (horizontal axis) . 

float sil_stage3_f tr, [NUM_FOV_AIiM] , 

[IiEN_FOV_FTR] : A set of 8 features for an 

object which was classified as 
abnormal by the stage3 96 
classifier. NUM_FOV_ALM refers 
to the number of the alarm as it 
was detected in the 2 Ox scan (up 
to 50 will have features 
recorded) . LEN_FOVJFTR refers 
to the feature number: 0-7 
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Cell Types Recognized by The invention 

The invention has been trained to recognize 
single or free lying cell types: normal, potentially 
abnormal, and artifacts that typically appear in 
Papanicolaou- stained cervical smears. This section 
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li.t. the cell types that were used to train the 
invention. 

Normal Single Cells 

single superficial squamous 
5 single intermediate squamous 

single squamous metaplastic 

single parabasal squamous 

single endocervical 

single endometrial 
10 red blood cells 

Abnormal Single Cells 

single atypical squamous 
single atypical metaplastic 
single atypical endocervical columnar 
15 single atypical endometrial 
single low grade sil 
single high grade sil 

single endocervical columnar dysplasia, well 
segmented 

• M in 'eitu endocervical columnar, 
single carcinoma in situ, w»w 

well segmented 
single adenocarcinoma, endocervical columnar 
single adenocarcinoma, endometrial 
single adenocarcinoma, metaplastic 
single invasive carcinoma, small cell squamous 
single invasive carcinoma, large cell squamous 
single invasive carcinoma, keratinizing squamous 
single marked repair/reactive squamous 
single marked repair/reactive, endocervical 
30 single marked repair/reactive, metaplastic 
single herpes 
single histiocyte 
single lymphocyte 
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single slightly enlarged superficial squamous 
single slightly enlarged intermediate squamous 
single slightly enlarged metaplastic squamous 
single slightly enlarged parabasal squamous 
5 slightly enlarged endocervical 

Artifacts 

single air dried intermediate cell nucleus 
single air dried metaplastic/parabasal cell nucleus 
single air dried endocervical cell nucleus 
10 single questionable abnormal cell nucleus 

single over segmented intermediate cell nucleus 
single over segmented metaplastic/parabasal cell 
nucleus 

single artifact, 1 nucleus over segmented 
15 artifact, 2 nuclei 

artifact, 3+ nuclei 

single folded cytoplasm 

cytoplasm only 

bare nucleus 
20 unfocused 

polymorphs (white blood cells) 

graphites 

corn flaking 

mucous 

25 junk from cover slip 
other junk 

The invention has been described herein in 
considerable detail in order to comply with the 
Patent Statutes and to provide those skilled in the 
3 0 art with the information needed to apply the novel 

principles and to construct and use such specialized 
components as are required. However, it is to be 
understood that the invention can be carried out by 
specifically different equipment and devices, and 
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that various modifications, both as to the equipment 
details and operating procedures, can be 
accomplished without departing from the scope of the 
invention itself. 

What is claimed is: 
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CLAIMS 

A cell identification apparatus for identifying 
object types of interest, the apparatus 
comprising : 

(a) an image segmenter means (10) for 
processing at least one image (11) of a 
biological specimen having a segmented 
image output; 

(b) feature calculation means (12) for 
computing features having at least one 
feature output; and 

(c) means for classifying objects (14), 
connected to receive the at least one 
feature output, having a classified output 
where the classified output identifies 
objects (80) as being object types of 
interest . 

The apparatus of claim 1 wherein the feature 
calculation means (12) comprises an object 
feature extractor. 

The apparatus of claim 1 wherein the feature 
calculation means (12) comprises a contextual 
feature extractor. 

The apparatus of claim 1 wherein the feature 
calculation means (12) comprises a whole image 
feature extractor. 

The apparatus of claim 1 wherein the objects 
(80, 82) comprise free-lying cells. 

The apparatus of claim 1 wherein the objects 
(80, 82) comprise non-nuclear overlapped cells. 
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7. The apparatus of claim 1 wherein the object 
types of interest (80, 82) comprise normal 
cells, abnormal cells or artifacts. 

8. The apparatus of claim 7 wherein the normal 
cells comprise reference intermediate cells 
(142) . 

9. The apparatus of claim 7 wherein the abnormal 
cells comprise cancerous and precancerous 
cells. 

10. The apparatus of claim 1 wherein the biological 
specimen is a specimen prepared by the 
Papanicolaou method. 

11. The apparatus of claim 1 wherein the biological 
specimen is a gynecological specimen. 

12. The apparatus of claim 1 further comprising a 
means for accumulating the classified output 
(18) . 



20 



25 



13. The apparatus of claim 1 comprising a means for 
measuring a stain (92) of at least one type of 
object (142, 144, 146, 148). 

14. The apparatus of claim 13 wherein the at least 
one type of object (80) comprises reference 
intermediate cells (142) . 

15. The apparatus of claim 1 further comprising a 
means for measuring a classification confidence 

(216) for a set of objects (80, 82) classified 
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as being object types of interest (80, 82) . 

16. The apparatus of claim 1 further comprising a 
means for measuring a reliability of object 
segmentation (24) . 

5 17. The apparatus of claim 1 further comprising a 
means for measuring repeatability of 
classification results (Figure 7B) . 

18. A free-lying cell segmenter (10) comprising: 

(a) a means for acquiring at least one image 
10 (28) of a biological specimen having an 

image output (29) ; 

(b) a means for creating a contrast enhanced 
image (30) having an enhanced image output 
(31) wherein the means for creating a 

15 contrast enhanced image (30) is connected 

to receive the at least one image (29) ; 

(c) a means for image thresholding (32) having 
an image threshold output (33) wherein the 
means for image thresholding (32) is 

20 connected to receive the contrast enhanced 

image (31) ; and 

(d) a means for object refinement (34) having 
a refined object output wherein the means 
for object refinement (34) is connected to 

25 receive the thresholded image output (33). 

19. A feature classifier for performing a plurality 
of stages of feature extraction (12) and object 
classification (14) on cells in a biological 
specimen comprising: 

30 (a) means for acquiring at least one image 

(28) of a biological specimen; 
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(b) 
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an initial stage classifier means (90) for 
determining whether objects (80, 82) in 
the at least one image are object types of 
interest and other objects; and 
a sequence of object classifiers (92, 94, 
96 98, 100) wherein, each object 
classifier has an object type of interest 
input, an object type of interest output 
and an other object type output, and 
wherein the object type of interest output 
is connected to the object type of 
interest input of a next classifier (92, 
94, 96, 98, 100) in the sequence. 

The apparatus of claim 19 further comprising: 

(a) an initial box filter means (90) for 
determining whether objects (80, 82) are 
normal, potentially abnormal or artifacts; 

(b ) a stage 1 classifier means (92) for 
processing the normal and potentially 
abnormal objects into a potentially 
abnormal, artifact or normal object; 

(C ) a stage 2 classifier means (94) for 
determining whether the potentially 
abnormal objects from the stage 1 
classifier (92) are potentially abnormal, 
artifact or normal; 

(d ) a stage 3 classifier (96) for determining 
whether the potentially abnormal objects 
from the stage 2 classifier (94) are 
potentially abnormal or are normal and 
artifact objects; 

(e) a stage 4 classifier (98) for determining 
whether the potential abnormal objects 
from the stage 3 classifier (96) are 
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potentially abnormal or normal artifacts. 

21. The apparatus of claim 19 further comprising a 
diagnostic classifier means (100) for 
determining whether the objects of interest 
5 (80, 82) from a final classifier (96) in the 

sequence of classifiers are low grade squamous 
intraepithelial lesions, potential high grade 
squamous intraepithelial lesions, cancerous 
lesions and normal artifacts. 

10 22. The apparatus of claim 19 wherein the object 
types of interest (80, 82) comprise normal 
cells (142), abnormal cells and artifacts. 

23. The apparatus of claim 22 wherein the normal 
cells (142) comprise reference intermediate 

15 cells. 

24. The apparatus of claim 22 wherein the abnormal 
cells comprise cancerous and precancerous 
cells. 

25. The apparatus of claim 19 wherein the 

20 biological specimen is a specimen prepared by 

the Papanicolaou method. 

26. The apparatus of claim 19 wherein the 
biological specimen is a gynecological 
specimen . 

25 27. The apparatus of claim 19 further comprising a 
means for computing (94) an atypicality index 
(22) . 
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28. The apparatus of claim 20 wherein the initial 
box filter (90) further comprises a filter 
selected from the group consisting of a dark 
object filter (104) , an unfocused object filter 
(106), a polymorphonuclear leukocytes filter, a 
graphite filter (108) , and a cytoplasm filter 
(110) . 

29. The apparatus of claim 19 wherein at least one 
of the classifiers in the sequence of object 
classifiers (90, 92, 94, 96, 98, 100) comprises 
a box filter (90) . 



30. The apparatus of claim 19 wherein at least one 
of the classifiers in the sequence of object 
classifiers (90, 92, 94, 96, 98, 100) comprises 

15 a decision tree classifier (Figure 7B) . 

31. The apparatus of claim 19 wherein at least one 
of the classifiers in the sequence of object 
classifiers (90, 92, 94, 96, 98, 100) comprises 
a binary decision tree classifier (Figure 7B) . 

20 32. The apparatus of claim 19 wherein at least one 
of the classifiers in the sequence of object 
classifiers (90, 92, 94, 96, 98, 100) comprises 
a fuzzy classifier. 

33. The apparatus of claim 19 wherein at least one 
25 of the classifiers in the sequence of object 

classifiers (90, 92, 94, 96, 98, 100) comprises 
a non-parametric classifier. 



34 . 



The apparatus of claim 19 wherein at least one 
of the classifiers (Figure 8) in the sequence 
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of object classifiers (90, 92, 94, 96, 98, 100) 
further comprises means for measuring 
confidence (216) . 

35. The apparatus of claim 20 wherein the stage 4 
5 classifier (98) comprises: 

(a) . a feature combination classifier (202) for 

classifying objects as normal or abnormal; 

(b) a means for computing a probability (210) 
of abnormal objects being abnormal; 

10 (c) a means for combining (206) a second set 

of features to determine whether the 
object is classified as normal or 
abnormal 

(d) a means for computing a probability (214) 
15 of the object being abnormal; and 

(e) a means for combining (216) the first 
probability (210) and the second 
probability (214) to produce a final 
confidence factor. 

20 36. The apparatus of claim 21 wherein the 
diagnostic classifier, being a ploidy 
classifier, further comprises: 

(a) means for computing a probability that the 
object is abnormal (224) ; 
25 (b) means for computing whether the object is 

classified as aneuploid (230) ; 

(c) means for computing a probability that the 
object is aneuploid (232) ; and 

(d) means for combining the first probability 
30 and the second probability to provide a 

final confidence (234) . 

37. The apparatus of claim 19 further including a 
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plurality of computer processors (540) wherein 
the plurality of computer processors (540) 
perform multilayered processing. 

An apparatus for computing a stain score from a 
biological specimen comprising: 

(a) means for acquiring at least one image 

(28) of a biological specimen; 

(b) means for classifying objects (14) that 
are object types of interest (142, 144, 
146. 148) in the at least one image (28), 
wherein the means for classifying objects 
(14) provides a classified object output ; 

(c) means for measuring stain feature values 
(92) from the objects of interest (142, 
144, 146, 148), connected to the 
classified object output, wherein the 
means for measuring stain feature values 
(92) has a stain feature value output; and 

(d ) means for accumulating stain feature 

values (18) connected to the stain feature 
value output, and wherein the means for 
accumulating stain feature values (18) 
generates a stain score output (21) . 

39 The apparatus of claim 38 wherein the stain 
feature values (21) comprise a density of an 
object of interest (142, 144, 146. 148). 

40 The apparatus of claim 38 wherein the stain 
feature values (21) comprise texture of the 
object of interest (142, 144. 146, 148). 



41. 



The apparatus of claim 38 wherein the stain 
feature (21) comprises a difference in at least 
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one feature of the objects of interest (142, 
144, 146, 148) and at least one feature 
measurement of the background of the objects of 
interest. 

An apparatus for measuring the repeatability of 
classification for a biological specimen 
comprising: 

(a) means for acquiring at least one image 
(10) of a biological specimen; 

(b) means, connected to receive the at least 
one image, for computing object features 
(12) having an object features output; 

(c) means for classifying objects (14) 
connected to the object features output, 
wherein the means for classifying objects 
provides a classified object output; 

(d) means for estimating a classification 
repeatability (Figure 7B) of object types, 
connected to the classified object output 
and object features output, wherein the 
means for estimating (Figure 7B) has a 
classification repeatability output. 

The apparatus of claim 4 2 wherein the means for 
estimating the classification repeatability 

(Figure 7B) further comprising feature distance 
measuring means for computing a distance from a 
feature value to a classification boundary 

(Figure 6B) of the objects of interest. 

An apparatus for measuring the reliability for 
object segmentation of a biological specimen 
comprising : 

(a) means for acquiring at least one image 
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(28) of a biological specimen having an 
image output (29) ; 

(b) means for image segmentation (10) 
connected to the image output (29) to 
detect objects of interest (80, 82), 
wherein the means for image segmentation 
(10) has a segmented object output; 

(c) means for feature extraction (12) 
connected to the segmented object output, 
wherein the means for feature extraction 
(12) has a segmentation reliability 
feature output (24) ; 

(d) means for classification of objects (14) 
connected to the segmentation reliability 
feature output (24) having a classified 
output (216) , where the classified output 
(216) comprises a measure of the 
reliability of the segmented object 
output . 

20 45 A feature classification process for performing 
a plurality of stages of feature extraction and 
object classification on cells in a. biological 
specimen comprising: 

(a) an initial box filter means (90) for 
determining whether objects (80, 82) are 
normal and potentially abnormal or 
artifacts; 

(b) a stage 1 classifier means (92) for 
processing the normal and potentially 

30 abnormal objects into a potentially 

abnormal, artifact or normal object; 

(c) a stage 2 classifier means (94) for 
determining whether the potentially 
abnormal objects from the stage 1 
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classifier (92) are potentially abnormal, 
artifact or normal; 

(d) a stage 3 classifier (96) for determining 
whether the potentially abnormal objects 

5 from the stage 2 classifier (94) are 

potentially abnormal or are normal and 
artifact objects; and 

(e) a stage 4 classifier (98) for determining 
whether the potential abnormal objects 

10 from the stage 3 classifier (96) are 

potentially abnormal or are normal 
artifacts . 

46. The apparatus of claim 27 further comprising a 
diagnostic classifier means (100) for 

15 determining whether the objects of interest 

(80, 82) in the output of the stage 3 
classifier (96) are low grade squamous 
intraepithelial lesions, potential high grade 
squamous intraepithelial lesions, cancerous 

20 lesions or normal artifacts. 
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